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(54) Dna sequences encoding enzymes involved in production of isoprenoids 

(57) The present invention is directed to an isolated 
DNA sequence coding for an enzyme involved in the 
mevalonate pathway or the pathway from isopentenyl 
pyrophosphate to farnesyl pyrophosphate, vectors or 
plasmids comprising such DNA, hosts transformed by 
either such DNAs or vectors or plasmids and a process 
for the production of isoprenoids and carotenoids by 
using such transformed host cells. 
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Description 

[0001 J The present invention relates to molecular biology for the manufacture of isoprenoids and biological materials 
useful therefor. 

5 [0002] Astaxanthin is known to distribute in a wide variety of organisms such as animal (e.g. birds such as flamingo 
and scarlet ibis, and fish such as rainbow trout and salmon), algae and microorganisms. It is also recognized that astax- 
anthin has a strong anttoxidation property against oxygen radical, which is expected to apply to pharmaceutical usage 
to protect living cells against some diseases such as a cancer. Moreover, from a viewpoint of industrial application, a 
demand for astaxanthin as a coloring reagent is increasing especially in the industry of farmed fish, such as salmon, 

10 because astaxanthin imparts distinctive orange-red coloration to the animals and contributes to consumer appeal in the 
marketplace. 

[0003] Phaffia rhodozyma is known as a carotenogenic yeast strain which produces astaxanthin specifically. Different 
from the other carotenogenic yeast, Rhodotorula species. Phaffia rhodozyma {R rhodozyma) can ferment some sugars 
such as D-glucose. This is an important feature from a viewpoint of industrial application. In a recent taxonomic study, 

15 a sexual cycle of R rhodozyma was revealed and its telemorphic state was designated under the name of Xanthophyl- 
lomyces dendrorhous (W. I. Golubev; Yeast 11,101 -110,1 995). Some strain improvement studies to obtain hyper pro- 
ducers of astaxanthin from R rhodozyma have been conducted, but such efforts have been restricted to employ the 
method of conventional mutagenesis and protoplast fusion in this decade. Recently, Wery et a/, developed a host vector 
system using R rhodozyma in which a non-replicable plasmid was used to be integrated onto the genome of R 

20 rhodozyma at the locus of ribosomal DNA in multicopies (Wery et a/., Gene, 184, 89-97, 1997). And Verdoes et at. 
reported more improved vectors to obtain a transformant of R rhodozyma as well as its three carotenogenic genes 
which code the enzymes that catalyzes the reactions from geranylgeranyl pyrophosphate to p-carotene (International 
patent W097/23633). The importance of genetic engineering method on the strain improvement study of R rhodozyma 
will increase in near future to break through the reached productivity by the conventional methods. 

25 [0004] It is reported that the carotenogenic pathway from a general metabolite, acetyl-CoA consists of multiple enzy- 
matic steps in carotenogenic eukaryotes as shown in Fig.1. Two molecules of acetyl-CoA are condensed to yield ace- 
toacetyl-CoA which is converted to 3-hydroxy-3-methyglutaryl-CoA (HMG-CoA) by the action of 3-hydroxymethyl-3- 
glutaryl-CoA synthase. Next, 3-hydroxy-3-methylglutaryl-CoA reductase converts HMG-CoA to mevalonate, to which 
two molecules of phosphate residues are then added by the action of two kinases (mevalonate kinase and phosphom- 

30 evalonate kinase). Mevalonate pyrophosphate is then decarboxylated by the action of mevalonate pyrophosphate 
decarboxylase to yield isopentenyl pyrophosphate (IPP) which becomes a building unit of wide varieties of isoprene 
molecules which is necessary in living organisms. This pathway is called as mevalonate pathway taken from its impor- 
tant intermediate, mevalonate. IPP is isomerized to dimethylaryl pyrophosphate (DMAPP) by the action of IPP isomer- 
ase. Then, IPP and DMAPP converted to C 10 unit, geranyl pyrophosphate (GPP) by the head to tail condensation. In a 

35 similar condensation reaction between GPP and IPP. GPP is converted to C 15 unit, farnesyl pyrophosphate (FPP) 
which is an important substrate of cholesterol in animal and ergosterol in yeast, and of farnesylation of regulation pro- 
tein such as RAS protein. In general, the biosynthesis of GPP and FPP from IPP and DMAPP are catalyzed by one 
enzyme called FPP synthase (Laskovics et a/., Biochemistry, 20, 1893-1901, 1981). On the other hand, in prokaryotes 
such as eubacteria, isopentenyl pyrophosphate was synthesized in a different pathway via 1-deaxyxylulose-5-phos- 

40 phate from pyruvate which is absent in yeast and animal (Rohmer et a/., Biochem. J. , 295, 51 7-524, 1 993). In exclusive 
studies of cholesterol biosynthesis, it was shown that rate-limiting steps of cholesterol metabolism were in the steps of 
this mevalonate pathway, especially in its early steps catalyzed by HMG-CoA synthase and HMG-CoA reductase. The 
inventors paid their attention to the fact that the biosynthetic pathways of cholesterol and carotenoid share their inter- 
mediate pathway from acetyl-CoA to FPP, and tried to improve the rate-limiting steps in the carotenogenic pathway 

45 which might exist in the steps of mevalonate pathway, especially in early mevalonate pathway such as the steps cata- 
lyzed by HMG-CoA synthase and HMG-CoA reductase so as to improve the productivity of carotenoids, especially 
astaxanthin. 

[0005] This invention is created based on the above endeavor of the inventors. In accordance with this invention, the 
genes and the enzymes involved in the mevalonate pathway from acetyl-CoA to FPP which are biological materials 

so useful in the improvement of the astaxanthin production process are provided. This invention involves cloning and 
determination of the genes which code for HMG-CoA synthase, HMG-CoA reductase, mevalonate kinase, mevalonate 
pyrophosphate decarboxylase and FPP synthase. This invention also involves the enzymatic characterization as a 
result of the expression of such genes in suitable host organisms such as £ co//. These genes may be amplified in a 
suitable host, such as R rhodozyma and their effects on the carotenogenesis can be confirmed by the cultivation of 

55 such a transformant in an appropriate medium under an appropriate cultivation condition. 

[0006] According to the present invention, there are provided an isolated DNA sequence coding for an enzyme 
involved in the mevalonate pathway or the reaction pathway from isopentenyl pyrophosphate to farnesyl pyrophos- 
phate. More specifically, the said enzyme are those having an activity selected from the group consisting of 3-hydroxy- 
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3-methylglutaryl-CoA synthase activity, 3-hydroxy-3-methylglutaryyl-CoA reductase activity, mevalonate kinase activity, 
mevalonate pyrophosphate decarboxylase activity and farnesyl pyrophosphate synthase. 

[0007] The said isolated DNA sequence may be more specifically characterized in that (a) it codes for the said 
enzyme having an amino acid sequence selected from the group consisting of those described in SEQ ID NOs: 6, 7, 8, 

5 9 and 10, or (b) it codes for a variant of the said enzyme selected from (i) an allelic variant, and 00 an enzyme having 
one or more amino acid addition, insertion, deletion and/or substitution and having the stated enzyme activity. Particu- 
larly specified isolated DNA sequence mentioned above may be that which can be derived from a gene of Phaffia 
rhodozyma and is selected from (i) a DNA sequence represented in SEQ ID NOs: 1, 2, 4 or 5; (ii) an isocoding or an 
allelic variant for the DNA sequence represented in SEQ ID NOs: 1, 2, 4 or 5; and (iii) a derivative of a DNA sequence 

to represented in SEQ ID NOs: 1, 2, 4 or 5 with addition, insertion, deletion and/or substitution of one or more nude- 
otide(s), and coding for a polypeptide having the said enzyme activity. Such derivatives can be made by recombinant 
means on the basis of the DNA sequences as disclosed herein by methods known in the state of the art and disclosed 
e.g. by Sambrook et al. (Molecular Cloning, Cold Spring Harbour Laboratory Press, New York, USA, second edition 
1989). Amino acid exchanges in proteins and peptides which do not generally alter the activity are known in the state 

15 of the art and are described, for example, by H. Neurath and R. L Hill in £)The Proteins6 (Academic Press, New York, 
1979, see especially Figure 6, page 14). The most commonly occurring exchanges are: Ala/Ser, Vai/lle, Asp/Giu, 
Thr/Ser, Ala/Gly, Ala/Thr, Ser/Asn, Ala/Val, Ser/Gly, Tyr/Phe, Ala/Pro, Lys/Arg, Asp/Asn, Leu/lle, Leu/Val, Ala/Glu, 
Asp/Giy, as well as these in reverse. 

[0008] The present invention also provides an isolated DNA sequence, which is selected from (i) a DNA sequence 
20 represented in SEQ ID NO: 3; (ii) an isocoding or an allelic variant for the DNA sequence represented in SEQ ID NO: 
3; and (iii) a derivative of a DNA sequence represented in SEQ ID NO: 3 with addition, insertion, deletion and/or sub- 
stitution of one or more nucleotide®, and coding for a polypeptide having the mevalonate kinase activity. 
[0009] Furthermore the present invention is directed to those DNA sequences as specified above and as disclosed, 
e.g. in the sequence listing as well as their complementary strands, or those which include these sequences, DNA 
25 sequences which hybridize under standard conditions with such sequences or fragments thereof and DNA sequences, 
which because of the degeneration of the genetic code, do not hybridize under standard conditions with such 
sequences but which code for polypeptides having exactly the same amino acid sequence. 

[001 0] "Standard conditions" for hybridization mean in this context the conditions which are generally used by a man 
skilled in the art to detect specific hybridization signals and which are described, e.g. by Sambrook et al., "Molecular 

30 Cloning" second edition, Cold Spring Harbour Laboratory Press 1989, New York, or preferably so called stringent 
hybridization and non-stringent washing conditions or more preferably so called stringent hybridization and stringent 
washing conditions a man skilled in the art is familiar with and which are described. e.g. in Sambrook et al. (s.a.). Fur- 
thermore DNA sequences which can be made by the polymerase chain reaction by using primers designed on the basis 
of the DNA sequences disclosed herein by methods known in the art are also an object of the present invention, ft is 

35 understood that the DNA sequences of the present invention can also be made synthetically as described, e.g. in EP 
747 483. 

[001 1 ] Further provided by the present invention is a recombinant DNA, preferably a vector and/or piasmid comprising 
a sequence coding for an enzyme functional in the mevalonate pathway or the reaction pathway from isopentenyl pyro- 
phosphate to farnesyl pyrophosphate. The said recombinant DNA vector and/or piasmid may comprise the regulatory 

40 regions such as promoters and terminators as well as open reading frames of above named DNAs. 

[0012] The present invention also provides the use of the said recombinant DNA, vector or piasmid, to transform a 
host organism. The recombinant organism obtained by use of the recombinant DNA is capable of overexpressing DNA 
sequence encoding an enzyme involved in the mevalonate pathway or the reaction pathway from isopentenyl pyrophos- 
phate to farnesyl pyrophosphate. The host organism transformed with the recombinant DNA may be useful in the 

45 improvement of the production process of isoprenoids and carotenoids, in particular astaxanthin. Thus the present 
invention also provides such a recombinant organismAransfbrmed host. 

[0013] The present invention further provides a method for the production of isoprenoids or carotenoids, preferably 
carotenoids, which comprises cultivating thus obtained recombinant organism. 

[001 4] The present invention also relates to a method for producing an enzyme involed in the mevalonate pathway or 
so the reaction pathway from isopentenyl pyrophosphate to farnesyl pyrophosphate, which comprises curturing a recom- 
binant organism mentioned above, under a condition conductive to the production of said enzyme and relates also to 
the enzyme itself. 

[0015] The present invention will be understood more easily on the basis of the enclosed figures and the more 
detailed explanations given below. 

55 

Fig. 1 depicts a scheme of deduced biosynthetic pathway from acetyl-CoA to astaxanthin in P rhodozyma. 

Fig. 2 shows the expression study by using an artificial mvk gene obtained from an artificial nucleotide addition at 
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amino terminal end of pseudo-mvfc gene from R rhodozyma. The ceils from 50 \i\ of broth were subjected to 10 % 
sodium dodesyl sulfide - polyacrylamide gel electrophoresis (SDS-PAGE). Lane 1 , £ coli (M15 (pREP4) (pQE30) 
without IPTG); Lane 2, E. coli (M15 (pREP4) (pQE30) with 1mM IPTG); Lane 3, Molecular weight marker (105 kDa, 
82.0 kDa, 49.0 kDa, 33.3 kD and 28.6 kDa, up to down, BIO- RAD); Lane 4, E.coli (M15 (pREP4) (pMK1209 #3334) 
fi without IPTG); Lane 5. E.coli (M15 (pREP4) (pMK1209 #3334) with 1mM IPTG). 

[0016] The present invention provides an isolated DNA sequence which code for enzymes which are involved in a 
biological pathway comprising the mevalonate pathway or the reaction pathway from isopentenyl pyrophosphate to far- 
nesyi pyrophosphate. The said enzymes can be exemplified by those involved in the mevalonate pathway or the reac- 

10 tion pathway from isopentenyl pyrophosphate to farnesyl pyrophosphate in Phaffia rhodozyma, such as 3-hydroxy-3- 
methylglutaryl-CoA synthase. 3-hydroxy-3-methylglutaryyl-CoA reductase, mevalonate kinase, mevalonate pyrophos- 
phate decarboxylase and farnesyl pyrophosphate synthase. The present invention is useful for the production of the 
compounds involved in the biological pathway from the mevalonate pathway to the carotenogenic pathway and various 
products derived from such compounds. The compounds involved in the mevalonate pathway are acetoacetyl-CoA, 3- 

is hydroxymethyl-3-glutaryl-CoA, mevalonic acid, mevalonate-phosphate, mevalonate-pyrophosphate and isopentenyl- 
pyrohposphate. Subsequently, isopentenyl-pyrohposphate is converted to geranylgeranyl-pyrophosphate through ger- 
anyl-pyrophosphate and farnesyl-pyrophosphate via the Jsoprene Biosynthesis" reactions as indicated in Fig. 1. The 
compounds involved in the carotenogenic pathway are geranylgeranyl-pyrophosphate, phytoene, lycopene, p-carotene 
and astaxanthin. Among the compounds involved in the above-mentioned biosynthesis, geranyl-pyrophosphate maybe 

20 utilized for the production of ubiquinone. Farnesyl-pyrophosphate can be utilized for the production of sterols, such as 
cholesterol and ergosterol. Geranylgeranyl-pyrophosphate is a useful material for the production of vitamin K, vitamin 
E, chlorophyll and the like. Thus the present invention will be particularly useful when it is applied to a biological pro- 
duction of isoprenoids. Isoprenoids is the general term which collectively designates a series of compounds having iso- 
pentenyl-pyrophosphate as a skeleton unit. Further examples of isoprenoids are vitamin A and vitamin D 3 . 

25 [0017] TTie said DNA of the present inveiton can mean a cDNA which contains only open reading frame flanked 
between the short fragments in its 5'- and 3'- untranslated region and a genomic DNA which also contains its regulatory 
sequences such as its promoter and terminator which are necessary for the expression of the gene of interest. 
[0018] In general, the gene consists of several parts which have different functions from each other. In eukaryotes, 
genes which encode corresponding protein are transcribed to premature messenger RNA (pre-mRNA) differing from 

30 the genes for ribosomal RNA (rRNA), small nuclear RNA (snRNA) and transfer RNA (tRNA). Although RNA polymerase 
II (Polll) plays a central role in this transcription event, Poll! can not solely start transcription without cis element cover- 
ing an upstream region containing a promoter and an upstream activation sequence (UAS), and a frans-acting protein 
factor. At first, a transcription initiation complex which consists of several basic protein components recognize the pro- 
moter sequence in the 5'-adjacent region of the gene to be expressed. In this event, some additional participants are 

35 required in the case of the gene which is expressed under some specific regulation, such as a heat shock response, or 
adaptation to a nutrition starvation, and so on. In such a case, a UAS is required to exist in the 5-untranslated upstream 
region around the promoter sequence, and some positive or negative regulator proteins recognize and bind to the UAS. 
The strength of the binding of transcription initiation complex to the promoter sequence is affected by such a binding of 
the frans-acting factor around the promoter, and this enables the regulation of the transcription activity. 

40 [001 9] After the activation of a transcription initiation complex by the phosphorylation, a transcription initiation complex 
initiates transcription from the transcription start site. Some parts of the transcription initiation complex are detached as 
an elongation complex from the promoter region to the 3' direction of the gene (this step is called as a promoter clear- 
ance event) and an elongation complex continues the transcription until it reaches to a termination sequence that is 
located in the 3'-adjacent downstream region of the gene. Pre-mRNA thus generated is modified in nucleus by the addi- 

45 tion of cap structure at the cap site which almost corresponds to the transcription start site, and by the addition of polyA 
stretches at the polyA signal which locates at the 3'-adjacent downstream region. Next, intron structures are removed 
from coding region and exon parts are combined to yield an open reading frame whose sequence corresponds to the 
primary amino acid sequence of a corresponding protein. This modification in which a mature mRNA is generated is 
necessary for a stable gene expression. cDNA in general terms corresponds to the DNA sequence which is reverse- 

50 transcribed from this mature mRNA sequence. It can be synthesized by the reverse transcriptase derived from viral spe- 
cies by using a mature mRNA as a template, experimentally. 

[0020] To express a gene which was derived from eukaryote, a procedure in which cDNA is cloned into an expression 
vector in £ coli is often used as shown in this invention. This causes from a fact that a specificity of intron structure var- 
ies among the organisms and an inability to recognize the intron sequence from other species. In fact, prokaryote has 
55 no intron structure in its own genetic background. Even in the yeast, genetic background is different between ascomyc- 
etea to which Saccharomyces cerevisiae belongs and basidiomycetea to which R rhodozyma belongs. Wery et al. 
showed that the intron structure of actin gene from R rhodozyma cannot be recognized nor spliced by the ascomycet- 
ous yeast Saccharomyces cerevisiae (Yeast, 12, 641-651, 1996). 



4 



EP0 955 363 A2 

[0021] Some other researchers reported that irrtron structures of some kinds of the genes involve regulation of their 
gene expressions (Dabeva, M. D. et a/.. Proc. Natl. Acad. Sci. U.S.A., 83. 5854, 1986). It might be important to use a 
genomic fragment which has its introns in a case of self-doning of the gene of a interest whose irrtron structure involves 
such a regulation of its own gene expression. 
5 [0022] To apply a genetic engineering method for a strain improvement study, it is necessary to study its genetic 
mechanism in the event such as transcription and translation. It is important to determine a genetic sequence such as 
its LIAS, promoter, intron structure and terminator to study the genetic mechanism. 

[0023] According to this invention, the genes which code for the enzymes involving the mevalonate pathway were 
cloned from genomic DNA of R rhodozyma, and their genomic sequence containing HMG-CoA synthase {hmc) gene, 
10 HMG-CoA reductase {hmg) gene, mevalonate kinase (mvk) gene, mevalonate pyrophosphate decarboxylase (mpd) 
gene and FPP synthase (fps) gene including their 5 - and 3-adjacent regions as well as their intron structures were 
determined. 

[0024] At first, we cloned a partial gene fragment containing a portion of hmc gene, hmg gene, mvk gene, mpd gene 
and fps gene by using degenerate PCR method. The said degenerate PCR is a method to clone a gene of interest 

is which has high homology of amino acid sequence to the known enzyme from other species which has a same or similar 
function. Degenerate primer, which is used as a primer in degenerate PCR, was designed by a reverse translation of 
the amino acid sequence to corresponding nucleotides (..degenerated"). In such a degenerate primer, a mixed primer 
which consists any of A, C, G or T, or a primer containing inosine at an ambiguity code is generally used. In this inven- 
tion, such the mixed primers were used for degenerate primers to clone above genes. PCR condition used is varied 

20 depending on primers and genes to clone as described hereinafter. 

[0025] An entire gene containing its coding region with its intron as well as its regulation region such as a promoter 
or terminator can be cloned from a chromosome by screening of genomic library which is constructed in phage vector 
or plasmid vector in an appropriate host, by using a partial DNA fragment obtained by degenerate PCR as described 
above as a probe after it was labeled. Generally, E. coli as a host strain and E. coli vector, a phage vector such as X 

25 phage vector, or a plasmid vector such as pUC vector is often used in the construction of library and a following genetic 
manipulation such as a sequencing, a restriction digestion, a ligation and the like. In this invention, an EcoRl genomic 
library of R rhodozyma was constructed in the derivatives of X vector, AZAPII and XDASHII depending on an insert size. 
An insert size, what length of insert must be cloned, was determined by the Southern blot hybridization for each gene 
before a construction of a library. In this invention, a DNA which was used for a probe was labeled with digoxigenin 

30 (DIG), a steroid hapten instead of conventional 32 P label, following the protocol which was prepared by the supplier 
(Boehringer-Mannheim). A genomic library constructed from the chromosome of R rhodozyma was screened by using 
a DIG-labeled DNA fragment which had a portion of a gene of interest as a probe. Hybridized plaques were picked up 
and used for further study. In the case of using XDASHIl (insert size was from 9 kb to 23 kb), prepared XDNA was 
digested by the EcoR\, followed by the cloning of the EcoRl insert into a plasmid vector such as pUC19 or pBluescriptll 

35 SK+. When XZAPII was used in the construction of the genomic library, in vivo excision protocol was conveniently used 
for the succeeding step of the cloning into the plasmid vector by using a derivative of single stranded M13 phage, Ex 
assist phage (Stratagene). A plasmid DNA thus obtained was examined for a sequencing. 

[0026] In this invention, we used the automated fluorescent DNA sequencer, ALFred system (Pharmacia) using an 
autocycle sequencing protocol in which the Taq DNA polymerase is employed in most cases of sequencing. 

40 [0027] After the determination of the genomic sequence, a sequence of a coding region was used for a cloning of 
cDNA of corresponding gene. The PCR method was also exploited to clone cDNA fragment. The PCR primers whose 
sequences were identical to the sequence at the 5 - and 3'- end of the open reading frame (ORF) were synthesized with 
an addition of an appropriate restriction site, and PCR was performed by using those PCR primers. In this invention, a 
cDNA pool was used as a template in this PCR cloning of cDNA. The said cDN A pool consists of various cDNA species 

45 which were synthesized in vitro by the viral reverse transcriptase and Taq polymerase (CapFinder Kit manufactured by 
Clorrtech was used) by using the mRNA obtained from R rhodozyma as a template. cDNA of interest thus obtained was 
confirmed in its sequence. Furthermore, cDNA thus obtained was used for a confirmation of its enzyme activity after 
the cloning of the cDNA fragment into an expression vector which functions in E. coli under the strong promoter activity 
such as the lac or T7 expression system. 

so [0028] Succeeding to the confirmation of the enzyme activity, an expressed protein would be purified and used for 
raising of the antibody against the purified enzyme. Antibody thus prepared would be used for a characterization of the 
expression of the corresponding enzyme in a strain improvement study, an optimization study of the culture condition, 
and the like. 

[0029] After the rate-limiting step is determined in the biosynthetic pathway which consists of multiple steps of enzy- 
55 matic reactions, there are three strategies to enhance its enzymatic activity of the rate-limiting reaction by using its 
genomic sequence. 

[0030] One strategy is to use its gene itseK as a native form. The simplest approach is to amplify the genomic 
sequence including its regulation sequence such as a promoter and a terminator. This is realized by the cloning of the 
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genomic fragment encoding the enzyme of interest into the appropriate vector on which a selectable marker that func- 
tions in R rhodozyma is harbored. A drug resistance gene which encodes the enzyme that enables the host survive in 
the presence of a toxic antibiotic is often used for the selectable marker. G418 resistance gene harbored in pGB-Ph9 
(Wery et al. (Gene, 1 84, 89-97, 1 997)) is an example of a drug resistance gene. Nutrition complementation maker can 

5 be also used in the host which has an appropriate auxotrophy marker. R rhodozyma ATCC24221 strain which requires 
cytidine for its growth is one example of the auxotroph. By using CTP synthetase as donor DNA for ATCC24221 , a host 
vector system using a nutrition complementation can be established. As a vector, two types of vectors would be used. 
One of the vectors is an integrated vector which does not have an autonomous replicating sequence. Above pGB-Ph9 
is an example of this type of a vector. Because such a vector does not have an autonomous replicating sequence in the 

yo vector, above vector cannot replicate by itself and can be present only in an integrated form on the chromosome of the 
host as a result of a single-crossing recombination using the homologous sequence between a vector and a chromo- 
some. In case of increasing a dose of the integrated gene on the chromosome, amplification of the gene is often 
employed by using such a drug resistance marker. By increasing the concentration of the corresponding drug in the 
selection medium, the strain, in which the integrated gene is amplified on the chromosome as a result of recombination 

15 only can survive. By using such a selection, a strain which has amplified gene can be chosen. Another type of vector is 
a replicable vector which has an autonomous replicating sequence. Such a vector can exist in a multicopy state and this 
makes a dose of the harbored gene also exist in a multicopy state. By using such a strategy, an enzyme of interest 
which is coded by the amplified gene is expected to be overexpressed. 

[0031 ] Another strategy to overexpress an enzyme of interest is a placement of a gene of interest under a strong pro- 

20 moter. In such a strategy, a copy number of a gene is not necessary to be in a multicopy state. This strategy is also 
applied to overexpress a gene of interest under the appropriate promoter whose promoter activity is induced in an 
appropriate growth phase and an appropriate timing of cultivation. Production of astaxanthin accelerates in a late phase 
of the growth such as the case of production of a secondary metabolite. Thus, the expression of carotenogenic genes 
may be maximized in a late phase of growth. In such a phase, gene expression of most biosynthesis enzyme 

25 decreases. For example, by placing a gene, which is involved in the biosynthesis of a precursor of astaxanthin and 
whose expression is under the control of a vegetative promoter such as a gene which encodes an enzyme which 
involves in mevalonate pathway, in the downstream of the promoter of carotenogenic genes, all the genes which are 
involved in the biosynthesis of astaxanthin become synchronized in their timings and phases of expression. 
[0032] Still another strategy to overexpress enzymes of interest is induction of the mutation in its regulatory elements. 

30 For this purpose, a kind of reporter gene such as p-galactosidase gene, luciferase gene, a gene coding a green fluo- 
rescent protein, and the like is inserted between the promoter and the terminator sequence of the gene of interest so 
that all the parts including promoter, terminator and the reporter gene are fused and function each other. Transformed 
R rhodozyma in which the said reporter gene is introduced on the chromosome or on the vector would be mutagenized 
in vivo to induce mutation within the promoter region of the gene of interest. Mutation can be monitored by detecting 

35 the change of the activity coded by the reporter gene. If the mutation occurs in a cis element of the gene, mutation point 
would be determined by the rescue of the mutagenized gene and sequencing. The determined mutation would be intro- 
duced to the promoter region on the chromosome by the recombination between a native promoter sequence and a 
mutated sequence. In the same procedure, the mutation occurring in the gene which encodes a frans-acting factor can 
be also obtained. It would also affect the overexpression of the gene of interest. 

40 [0033] A mutation can be also induced by an in vitro mutagenesis of a cis element in the promoter region. In this 
approach, a gene cassette, containing a reporter gene which is fused to a promoter region derived from a gene of inter- 
est at its 5-end and a terminator region from a gene of interest at its 3'-end, is mutagenized and then introduced into R 
rhodozyma. By detecting the difference of the activity of the reporter gene, an effective mutation would be screened. 
Such a mutation can be introduced in the sequence of the native promoter region on the chromosome by the same 

45 method as the case of an in vivo mutation approach. 

[0034] As a donor DNA. a gene which encodes an enzyme of mevalonate pathway or FPP synthase could be intro- 
duced solely or co-introduced by harboring on plasmid vector. A coding sequence which is identical to its native 
sequence, as well as its allelic variant, a sequence which has one or more amino acid additions, deletions and/or sub- 
stitutions can be used as far as its corresponding enzyme has the stated enzyme activity. And such a vector can be 

so introduced into R rhodozyma by transformation and a transformant can be selected by spreading the transformed cells 
on an appropriate selection medium such as YPD agar medium containing geneticin in the case of pGB-Ph9 as a vector 
or a minimal agar medium omitting cytidine in the case of using auxotroph ATCC24221 as a recipient. 
[0035] Such a genetically engineered R rhodozyma would be cultivated in an appropriate medium and evaluated in 
its productivity of astaxanthin. A hyper producer of astaxanthin thus selected would be confirmed in view of the relation- 

55 ship between its productivity and the level of gene or protein expression which is introduced by such a genetic engineer- 
ing method. 
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Examples 

[0036] The following materials and methods were emploied in the Example described below: 
5 Strains 

[0037] R rhodozyma ATCC96594 (This strain has been redeposited on April 8, 1998 as a Budapest Treaty deposit 
under accession No. 74438). 

w £ coli DH5a: P. <t>80d, /acZAM15, A(/acZYA-ar,gF)U169. hsd (r K \ m K + ), recA1, encfAl, cfeoR. fn/-1, supEAA, 
gyrA96, re/A1 (Toyobo) 

£ co// XL1 -Blue MRP: A(mcrA)183, A(mcrCB-/7sdSMR-mrr)173. enafAl, supE44, recM, gyrA96, re/A1, 
/ac[P proAB, /acFZAM15, Tn10 (tef)] (Stratagene) 

75 

£ co// SOLR: e14(mcrA), A(mcrCB-tec/SMR-mrr}171, sbcC, recB, recJ, umuC :: TnSfkan 1 ), uvrC, /ac. gyrA96, 
re/A1, f/w-1, endAI. X R , [F' proAB, /ac^Z AM15] Su'(nonsuppressing) (Stratagene, CA, USA) 

E. co// XL1 MRA(P2):A(n7crA)183, A(mc/CB-/7sdSMR-mrr)173, endM . sup E44, tf?/-1, flyrA96. re/A1, /ac (P2 lys- 
20 ogen) (Stratagene) 

£ co// BL21 (DE3) (pLysS): dcm\ ompJr B ' m B ~ Ion' X(DE3). pLysS (Stratagene) 

£ coli M15 (pREP4) (QIAGEN) (Zamenhof P. J. era/., J. Bacteriol. 110, 171-178, 1972) 

25 

£ co// KB822: pcnB80, zad :: Tn1 0, A(/acll1 69), tedRI 7, endA1 , thi-A , si/pE44 

£ co// TOP10: P, mcrA, A(mrr-/rsdRMS-rarBC), <t>80. A/acZ M15. A/acX74, recA1. cfeoR, araD139, (ara- 
feu)7697, £a/U, ga/K, rpsL (Str^, enc(A1, nupG (Invitrogen) 

30 

Vectors 
[0038] 

35 XZAPII (Stratagene) 
XDASHII (Stratagene) 
pBluescriptll SK+(Stratagene) 

40 

pUC57 (MBI Fermentas) 
pMOSBlue T-vector (Amersham) 
45 pET4c (Stratagene) 
pQE30 (QIAGEN) 
pCR2.1TOPO (Invitrogen) 

so 

Media 

[0039] R rhodozyma strain is maintained routinely in YPD medium (DIFCO). £ coli strain is maintained in LB medium 
(10 g Bacto-trypton, 5 g yeast extract (DIFCO) and 5 g NaCl per liter). NZY medium (5 g NaCI, 2 g MgS0 4 -7H 2 0. 5 g 
55 yeast extract (DIFCO), 1 0 g NZ amine type A (Sheffield) per liter) is used for X phage propagation in a soft agar (0.7 % 
agar (WAKO)). When an agar medium was prepared, 1 .5 % of agar (WAKO) was supplemented. 
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Methods 

[0040] General methods of molecular genetics were practiced according to Molecular cloning: a Laboratory Manual, 
2nd Edition (Cold Spring Harbor Laboratory Press, 1989). Restriction enzymes and T4 DNA ligase were purchased 

5 from Takara Shuzo (Japan). 

[0041 ] Isolation of a chromosomal DNA from R rhodozyma was performed by using Ql AGEN Genomic Kit (Ql AGEN) 
following the protocol supplied by the manufacturer. Mini-prep of plasmid DNA from transformed E. coli was performed 
with the Automatic DNA isolation system (PI-50, Kurabo, Co. Ltd., Japan). Midi-prep of plasmid DNA from an £ coli 
transfbrmant was performed by using QIAGEN column (QIAGEN). Isolation of k DNA was performed by Wizard lambda 

10 preps DNA purification system (Promega) following the protocol of the manufacturer. A DNA fragment was isolated and 
purified from agarose by using QIAquick or QIAEX II (QIAGEN). Manipulation of X phage derivatives was done accord- 
ing to the protocol of the manufacturer (Stratagene). 

[0042] Isolation of total RNA from R rhodozyma was performed by the phenol method using Isogen (Nippon Gene, 
Japan). mRNA was purified from total RNA thus obtained by using mRNA separation kit (Clontech). cDNA was synthe- 

75 sized by using CapFinder cDNA construction kit (Clontech). 

[0043] In vitro packaging was performed by using Gigapack Hi gold packaging extract (Stratagene). 
[0044] Polymerase chain reaction (PCR) is performed with the thermal cycler from Perkin Elmer model 2400. Each 
PCR condition is described in examples. PCR primers were purchased from a commercial supplier or synthesized with 
a DNA synthesizer (model 392, Applied Biosystems). Fluorescent DNA primers for DNA sequencing were purchased 

20 from Pharmacia. DNA sequencing was performed with the automated fluorescent DNA sequencer (ALFred, Pharma- 
cia). 

[0045] Competent cells of DH5ct were purchased from Toyobo (Japan). Competent cells of M15 (pREP4) were pre- 
pared by CaCI 2 method as described by Sambrook et al. (Molecular cloning: a Laboratory Manual. 2nd Edition, Cold 
Spring Harbor Laboratory Press, 1989). 

25 

Example 1 Isolation of mRNA from R rhodozyma and constr uction of cDNA library 

[0046] To construct cDNA library of R rhodozyma, total RNA was isolated by phenol extraction method right after the 
cell disruption and the mRNA from R rhodozyma ATCC96594 strain was purified by using mRNA separation kit (Clon- 
30 tech). 

[0047] At first, Cells of ATCC96594 strain from 10 ml of two-day-culture in YPD medium were harvested by centrifu- 
gation (1500 x g for 10 min.) and washed once with extraction buffer (10 mM Na-citrate/ HCI (pH 6.2) containing 0.7 M 
KCI). After suspending in 2.5 ml of extraction buffer, the cells were disrupted by French press homogenizer (Ohtake 
W rks Corp., Japan) at 1 500 kgf/cm 2 and immediately mixed with two times of volume of isogen (Nippon gene) accord- 

35 ing to the method specified by the manufacturer. In this step, 400 fig of total RNA was recovered. 

[0048] Then this total RNA was purified by using mRNA separation kit (Clontech) according to the method specified 
by the manufacturer. Finally, 16 fig of mRNA from R rhodozyma ATCC96594 strain was obtained. 
[0049] To construct cDNA library, CapFinder PCR cDNA construction kit (Clontech) was used according to the 
method specified by the manufacturer. One \ig of purified mRNA was applied for a first strand synthesis followed by 

40 PCR amplification. After this amplification by PCR, 1 mg of cDNA pool was obtained. 

Example 2 Clonino of the partial hmc (3-hvdroxv-3-m ethvlalutarvl-CoA synthase) gene from R rhodozyma 

[0050] To clone a partial hmc gene from R rhodozyma, a degenarate PCR method was exploited. Two mixed primers 
45 whose nucleotide sequences were designed and synthsized as shown in TABLE 1 based on the common sequence of 
known HMG-CoA synthase genes from other species. 

TABLE 1 

so Sequence of primers used in the cloning of hmc gene 

Hmgsl ; GGNAARTAYACNATHGGNYTNGGNCA (sense primer) (SEQ ID NO: 1 1) 
Hmgs3 ; TANARNSWNSWNGTRTACATRTTNCC (arrtisense primer) (SEQ ID NO: 12) 
(N=A, C, G or T; R=A or G, Y=C or T, H=A, T or C, S=C or G, W=A or T) 

55 

[0051 ] After the PCR reaction of 25 cycles of 95 °C for 30 seconds, 50 °C for 30 seconds and 72°C for 15 seconds 
by using ExTaq (Takara Shuzo) as a DNA polymerase and cDNA pool obtained in example 1 as a template, reaction 
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mixture was applied to agarose gel electrophoresis. A PCR band that has a desired length was recovered and purified 
by QIAquick (QIAGEN) according to the method by the manufacturer and then ligated to pMOSBlue T-vector (Amer- 
sham). After the transformation of competent £ coll DHSa, 6 white colonies were selected and plasmids were isolated 
with Automatic DNA isolation system. As a result of sequencing, it was found that 1 clone had a sequence whose 
5 deduced amino acid sequence was similar to known hmc genes. This isolated cDNA clone was designated as 
pHMC21 1 and used for further study. 

Example 3 Isolation of g enomic DNA from R rhodozvma 

io [0052] To isolate a genomic DNA from R rhodozyma, QIAGEN genomic kit was used according to the method spec- 
ified by the manufacturer. 

[0053] At first, cells of P. rhodozyma ATCC96594 strain from 100 ml of overnight culture in YPD medium were har- 
vested by centrifugation (1 500 x g for 1 0 min.) and washed once with TE buffer (1 0 mM Tris / HQ (pH 8.0) containing 1 
mM EDTA). After suspending in 8 ml of Y1 buffer of the QIAGEN genomic kit, lyticase (SIGMA) was added at the con- 
is centration of 2 mg/ml to disrupt cells by enzymatic degradation and the reaction mixture was incubated for 90 minutes 
at 30 °C and then proceeded to the next extraction step. Finally, 20 jig of genomic DNA was obtained. 

Example 4 Southern blot hybridization bv using dHMC21 1 as a probe 

20 [0054] Southern blot hybridization was performed to clone a genomic fragment which contains hmc gene from R 
rhodozyma. Two jig of genomic DNA was digested by EcoR\ and subjected to agarose gel electrophoresis followed by 
acidic and alkaline treatment. The denatured DNA was transferred to nylon membrane (Hybond N+, Amersham) by 
using transblot (Joto Rika) for an hour. The DNA which was transferred to nylon membrane was fixed by a heat treat- 
ment (80 °C, 90 minutes). A probe was prepared by labeling a template DNA (EcoR\- and Sa/I- digested pHMC211) 

25 with DIG multipriming method (Boehringer Mannheim). Hybridization was performed with the method specified by the 
manufacturer. As a result, hybridized band was visualized in the range from 3.5 to 4.0 kilobases (kb). 

Example 5 Cloning of a genomic fragment containing hmc gene 

30 [0055] Four jig of the genomic DNA was digested by EcoR\ and subjected to agarose gel electrophoresis. Then, 
DNAs whose length is within the range from 3.0 to 5.0 kb was recovered by QIAEX II gel extraction kit (QIAGEN) 
according to the method specified by the manufacturer. The purified DNA was ligated to 1 jig of Ecofli-digested and 
CIAP (calf intestine alkaline phosphatase) -treated AZAPII (Stratagene) at 16 °C overnight, and packaged by Gigapack 
III gold packaging extract (Stratagene). The packaged extract was infected to E. coll XL1 Blue MRP strain and over-laid 

35 with NZY medium poured onto LB agar medium. About 6000 plaques were screened by using Ecofll- and Sa/I- 
digested pHMC21 1 as a probe. Two plaques were hybridized to the labeled probe and subjected to In vivo excision pro- 
tocol according to the method specified by the manufacturer (Stratagene). It was found that isolated plasmids had the 
same fragments in the opposite direction each other as results of restriction analysis and sequencing. As a result of 
sequencing, the obtained EcoR\ fragment contained same nucleotide sequence as that of pHMC21 1 clone. One of 

40 these plasmids was designated as pHMC526 and used for further study. A complete nucleotide sequence was obtained 
by sequencing of deletion derivatives of pHMC526, and sequencing with a primer-walking procedure. The insert frag- 
ment of pHMC526 consists of 3431 nucleotides that contained 10 complete and an incomplete exons and 10 introns 
with about 1 kb of ^-terminal untranslated region. 

45 Example 6 Cloning of upstr eam region of hmc oene 

[0056] Cloning of 5 - adjacent region of hmc gene was performed by using Genome Walker Kit (Clontech), because 
pHMC 526 does not contain its 5' end of hmc gene. At first, the PCR primers whose sequences were shown in TABLE 
2 were synthesized. 

50 

TABLE 2 

Sequence of primers used in the cloning of 5 f - adjacent region of hmc gene 
u Hmc21 ; GAAGAACCCCATCAAAAGCCTCGA (primary primer) (SEQ ID NO: 13) 

Hmc22 ; AAAAGCCTCGAGATCCTTGTGAGCG (nested primer) (SEQ ID NO: 14) 
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[0057] Protocols for library construction and PCR condition were the same as those specified by the manufacturer by 
using the genomic DNA preparation obtained in Example 3 as a PCR template. The PCR fragments that had EcoRV 
site at the 5 f end (0.45 kb), and that had PvuW site at the 5' end (2.7 kb) were recovered and cloned into pMOSBlue T- 
vector by using E. coli DH5a as a host strain. As a result of sequencing of each 5 of independent clones from both con- 
structs, it was confirmed that the 5' adjacent region of hmc gene was cloned and small part (0.1 kb) of EcoR\ fragment 
within its 3' end was found. The clone obtained by the PvuW construct in the above experiment was designated as 
pHMCPv708 and used for further study. 

[0058] Next, Southern blot analysis was performed by the method as shown in above Example 4, and 5'- adjacent 
region of the hmc gene existed in 3 kb of EcoR\ fragment was determined. After a construction from 2.5 to 3.5 kb EcoR\ 
library in XZAPII, 600 plaques were screened and 6 positive clones were selected. As a result of sequencing of these 
6 clones, it was clarified that 4 clones within 6 positive plaques had the same sequence as that of the pHMCPv708, and 
one of those was named as pHMC723 and used for further analysis. 

[0059] The PCR primers whose sequences were shown in TABLE 3 were synthesized to clone small (0.1 kb) EcoR\ 
fragment locating between 3.5 kb and 3.0 kb EcoR\ fragments on the chromosome of R rhodozyma. 

TABLE 3 

Sequence of primers used in the cloning small EcoR\ portion of hmc gene. 
Hmc30 ; AGAAGCCAGAAGAGAAAA (sense primer) (SEQ ID NO: 15) 
Hmc31 ; TCGTCGAGGAAAGTAGAT (arrtisense primer) (SEQ ID NO: 16) 

[0060] The PCR condition was the same as shown in Example 2. Amplified fragment (0.1 kb in its length) was cloned 
into pMOSBlue T-vector and transformed E. coli DH5a. Plasmids were prepared from 5 independent white colonies and 
subjected to the sequencing. 

[0061] Thus, it was determined that the nucleotide sequence (4.8 kb) contained hmc gene (SEQ ID NO: 1). Coding 
region was in 2432 base pairs that consisted of 1 1 exons and 10 introns. Introns were scattered all through the coding 
region without 5' or 3' bias. It was found that open reading frame consists of 467 amino acids (SEQ ID NO: 6) whose 
sequence is strikingly similar to the known amino acid sequence of HMG-CoA synthase gene from other species (49.6 
% identity to HMG-CoA synthase from Schizosaccharomyces pombe). 

Example 7 Expression of hmc oene in £ coli and confirmation of its enzymatic activity 

[0062] The PCR primers whose sequences were shown in TABLE 4 were synthesized to clone a cDNA fragment of 
hmc gene. 

TABLE 4 

Sequence of primers used in the cloning of cDN A of hmc gene 
Hmc25 ; GGTACCATATGTATCCTTCTACTACCGAAC (sense primer) (SEQ ID NO: 1 7) — 
Hmc2fr; GCATGCGGATCCTCAAGCAGAAGGGACCTG (arrtisense primer) (SEQ ID NO: 18) 



[0063] PCR condition was as follows: 25 cycles of 95 °C for 30 seconds, 55 °C for 30 seconds and 72 °C for 3 minutes. 
As a template, 0.1 fig of cDNApool obtained in Example 2 was used, and Pfu polymerase as a DNA polymerase. Ampli- 
fied 1 .5 kb fragment was recovered and cloned in pT7Blue-3 vector (Novagen) by using perfectly blunt cloning kit (Nova- 
gen) according to the protocol specified by the manufacturer. Six independent clones from white colonies of E. coli 
DH5a transformants were selected and plasmids were prepared from those transformants. As a result of restriction 
analysis, 2 clones were selected for a further selection by sequencing. One clone has an amino acid substitution at 
position 280 (from glycine to alanine) and another has at position 53 (from alanine to threonine). Alignment of an amino 
acid sequences derived from known hmc genes showed that alanine residue as well as glycine residue at position 280 
was observed well in alt the sequences from other species and this fact suggested that amino acid substitution at posi- 
tion 280 would not affect its enzymatic activity. This clone (mutant at position 280) was selected as pHMC731 for a suc- 
ceeding expression experiment. 

[0064] Next, 1 .5 kb fragment obtained by A/del- and BamH\- digestion of pHMC731 was ligated to pET1 1c (Strata- 
gene) digested by the same pairs of restriction enzymes, and introduced to E. coli DH5cl As a result of restriction anal- 
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ysis, plasmid that had a correct structure (pHMC818) was recovered. Then, competent £ coli BL21 (DE3) (pLysS) cells 
(Stratagene) were transformed, and one clone that had a correct structure was selected for further study. 
[0065] For an expression study, strain BL21 (DE3) (pLysS) (pHMC81 8) and vector control strain BL21 (DE3) (pLysS) 
(pET11c) were cultivated in 100 ml of LB medium at 37 °C until OD at 600 nm reached to 0.8 (about 3 hours) in the 

5 presence of 100 jjtg/ml of ampicillin. Then, the broth was divided in two portions of the same volume, and then 1 mM of 
isopropyl p-D-thiogalactopyranoside (IPTG) was added to one portion. Cultivation was continued for further 4 hours at 
37 °C. Twenty five jxl of broth was removed from induced- and uninduced- culture of hmc clone and vector control cul- 
tures and subjected to sodium dodesyl sulfate - polyacrylamide gel electrophoresis (SDS-PAGE) analysis. It was con- 
firmed that protein whose size was similar to deduced molecular weight from nucleotide sequence ( 50.8 kDa) was 

10 expressed only in the case of clone that harbored pHMC81 8 with the induction. Cells from 50 ml broth were harvested 
by the centrifugation (1500 x g, 10 minutes), washed once and suspended in 2 ml of hmc buffer (200 mM Tris-HCI (pH 
8.2)). Cells were disrupted by French press homogenizer (Ohtake Works) at 1500 kgf/cm 2 to yield a crude lysate. After 
the centrifugation of the crude lysate, a supernatant fraction was recovered and used as a crude extract for an enzy- 
matic analysis. In the only case of induced lysate of pHMC818 clone, a white pellet was spun down and was recovered. 

is Enzyme assay for 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) synthase was performed by the photometric assay 
according to the method by Stewart et a/. (J. Biol. Chem. 241(5), 1212-1221 , 1966). In all the crude extract, the activity 
of 3-hydroxy-3-methylglutaryl-CoA synthase was not detected. As a result of SDS-PAGE analysis of the crude extract, 
an expressed protein band that had found in expressed broth was disappeared. Subsequently the white pellet that was 
recovered from the crude lysate of induced pHMC818 clone was solubilized with 8 M guanidine-HCI, and then sub- 

20 jected to SDS-PAGE analysis. The expressed protein was recovered in the white pellet, and this suggested that 
expressed protein would form an inclusion body. 

[0066] Next, an expression experiment in more mild condition was conducted. Cells were grown in LB medium at 28 
°C and the induction was performed by the addition of 0.1 mM of IPTG. Subsequently, incubation was continued further 
for 3.5 hours at 28 °C and then the cells were harvested. Preparation of the crude extract was the same as the previous 
25 protocol. Results are summarized in TABLE 5. It was shown that HMG-CoA synthase activity was only observed in the 
induced culture of the recombinant strain harboring hmc gene, and this suggested that the cloned hmc gene encodes 
HMG-CoA synthase. 



Enzymatic characterization of hmc cDNA clone 


plasmid 


IPTG 


\i mol of HMG-CoA / 
minute / mg-protein 


pHMC818 




0 




+ 


0.146 


pET11c 




0 




+ 


0 



Example 8 Cloning of hma (3-hvdroxvmethvl-3- Qlutarvl-CoA reductase^ gene 

[0067] Cloning protocol of hmg gene was almost the same as the hmc gene shown in Example 2 to 7. At first, the 
45 PCR primers whose sequences were shown in TABLE 6 based on the common sequences of HMG-CoA reductase 
genes from other species were synthesized. 

TABLE 6 

so Sequence of primers used in the cloning of hmg gene 

Red1 i GCNTGYTGYGARAAYGTNATHGGNTAYATGCC (sense primer) (SEQ ID NO: 19) 
Red2 ; ATCCARTTDATNGCNGCNGGYTTYTTRTCNGT (antisense primer) (SEQ ID NO: 20) 
(N=A, C, G or T; R=A or G, Y=C or T, H=A, T or C. D=A, GorT) _ " 

55 

[0068] After the PCR reaction of 25 cycles of 95 °C for 30 seconds, 54 °C for 30 seconds and 72°C for 30 seconds 
by using ExTaq (Takara Shuzo) as a DNA polymerase, reaction mixture was applied to agarose gel electrophoresis. 
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PGR band that has a desired length was recovered and purified by QIAquick (QIAGEN) according to the method by the 
manufacturer and then ligated to pUC57 vector (MBI Fermentas). After the transformation of competent E. coli DH5a, 
7 white colonies were selected and the plasmids were isolated from those transformants. As a result of sequencing, it 
was found that all the clones had a sequence whose deduced amino acid sequence was similar to known HMG-CoA 
reductase genes. One of the isolated cDNA clones was named as pRED1219 and used for further study. 
[0069] Next, a genomic fragment containing 5'- and 3'- adjacent region of hmg gene was cloned with the Genome 
Walker kit (Clontech). The 2.5 kb fragment of 5' adjacent region (pREDPVu1226) and 4.0 kb fragment of 3' adjacent 
region of hmg gene (pREDEVd1226) were cloned. Based on the sequence of the insert of pREDPVu1226, PCR prim- 
ers whose sequence were shown in TABLE 7 were synthesized. 



TABLE 7 

Sequence of primers used in the cloning of cDNA of hmg gene 
Red8 ; GGCCATTCCACACTTGATGCTCTGC (antisense primer) (SEQ ID NO: 21) 
Red9 ; GGCCGATATCTTTATGGTCCT (sense primer) (SEQ ID NO: 22) 



[0070] Subsequently a cDNA fragment containing a long portion of hmg cDNA sequence was cloned by a PCR 
method by using Red 8 and Red 9 as PCR primers and the cDN A pool prepared in Example 2 and thus cloned plasmid 
was named as pRED107. PCR condition was as follows; 25 cycles of 30 seconds at 94 °C, 30 seconds at 55 °C and 1 
minute at 72 °C. 

[0071] Southern blot hybridization study was performed to clone genomic sequence which contains the entire hmg 
gene from R rhodozyma. Probe was prepared by labeling a template DNA, pRED107 with DIG multipriming method. 
Hybridization was performed with the method specified by the manufacturer. As a result, labeled probe hybridized to 
two bands that had 12 kb and 4 kb in their lengths. As a result of sequencing of pREDPVu1226, EcoRl site wasnl found 
in the cloned hmg region. This suggested that another species of hmg gene (that has 4 kb of hybridized EcoR\ frag- 
ment) existed on the genome of P. rhodozyma as found in other organisms. 

[0072] Next, a genomic library consisting of 9 to 23 kb of EcoR\ fragment in the XDASHII vector was constructed. The 
packaged extract was infected to E. coli XL1 Blue, MRA(P2) strain (Stratagene) and over-laid with NZY medium poured 
onto LB agar medium. About 5000 plaques were screened by using 0.6 kb fragment of Stu\- digested pRED107 as a 
probe. 4 plaques were hybridized to the labeled probe. Then a phage lysate was prepared and DNA was purified with 
Wizard lambda purification system according to the method specified by the manufacturer (Promega) and was digested 
with EcoR\ to isolate 10 kb of EcoR\ fragment and to clone in EcoR\ -digested and ClAP-treated pBluescriptll KS-(Strat- 
agene). Eleven white colonies were selected and subjected to a colony PCR by using Red9 and -40 universal primer 
(Pharmacia). Template DNA for a colony PCR was prepared by heating cell suspension in which picked-up colony was 
suspended in 10 jil of sterilized water for 5 minutes at 99 °C prior to a PCR reaction (PCR condition; 25 cycles of 30 
seconds at 94 °C, 30 seconds at 55 °C and 3 minutes at 72 °C). One colony gave 4 kb of a positive PCR band, and it 
suggested that this clone had an entire region containing hmg gene. A plasmid from this positive clone was prepared 
and named as pRED61 1 . Subsequently deletion derivatives of pRED61 1 were made up for sequencing. By combining 
the sequence obtained from deletion mutants with the sequence obtained by a primer-walking procedure, the nucle- 
otide sequence of 7285 base pairs which contains hmg gene from P. rhodozyma was determined (SEQ ID NO: 2). The 
hmg gene from R rhodozyma consists of 10 exons and 9 introns. The deduced amino acid sequence of 1092 amino 
acids in its length (SEQ ID NO: 7) showed an extensive homology to known HMG-CoA reductase (53.0 % identity to 
HMG-CoA reductase from Ustilago maydis). 

Example 9 Expression of c arboxyl-terminal domain of hma gene in £ COli 

[0073] Some species of prokaryotes have soluble HMG-CoA reductases or related proteins (Lam et a/.. J. Biol. Chem. 
267, 5829-5834, 1992). However, in eukaryotes, HMG-CoA reductase is tethered to the endoplasmic reticulum via an 
amino-terminal membrane domain (Skalnik etal., J. Biol. Chem. 263, 6836-6841, 1988). in fungi (i.e. Saccharomyces 
cerevisiae and the smut fungus, Ustilago maydis) and in animals, the membrane domain is large and complex, contain- 
ing seven or eight transmembrane segments (Croxen et al. Microbiol. 140, 2363-2370, 1994). In contrast, the mem- 
brane domains of plant HMG-CoA reductase proteins have only one or two transmembrane segments (Nelson et al. 
Plant Mol. Biol. 25, 401-412, 1994). Despite the difference in the structure and sequence of the transmembrane 
domain, the amino acid sequences of the catalytic domain are conserved across eukaryotes, archaebacteria and 
eubacteria. 

[0074] Croxen et al. showed that C-terminal domain of HMG-CoA reductase derived from the maize fungal pathogen, 
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Ustilago maydis was expressed in active form in £ coli (Microbiology, 140, 2363-2370, 1994). The inventors of the 
present inventiontf ied to express a C-terminal domain of HMG-CoA reductase from R rhodozyma in £ coli to confirm 
its enzymatic activity. 

[0075] > At first, the PCR primers whose sequences were shown in TABLE 8 were synthesized to clone a partial cDNA 
5 fragment of hmg gene. The sense primer sequence corresponds to the sequence which starts from 597th amino acid 
(glutamate) residue, and a length of protein and cDNA which was expected to obtain was 496 amino acids and 1 .5 to, 
respectively. 



Sequence of primers used in the cloning of a partial cDNA of hmg gene 
Red54 ; GGTACCGAAGAAATTATQAAGAGTGG (sense primer) (SEQ ID NO: 23) 
Red55 ; CTGCAGTCAGGCATCCACGTTCACAC (antisense primer) (SEQ ID NO: 24) 

15 

[0076] The PCR condition was as follows; 25 cycles of 95 °C for 30 seconds, 55 °C for 30 seconds and 72 °C for 3 
minutes. As a template, 0.1 \lq of cDNA pool obtained in Example 2 and as a DNA polymerase, ExTaq polymerase were 
used. Amplified 1.5 kb fragment was recovered and cloned in pMOSBlue T-vector (Novagen). Twelve independent 

20 clones from white colonies of £ coli DH5a transformants were selected and plasmids were prepared from those trans- 
formants. As a result of restriction analysis, all the clones were selected for a further selection by sequencing. One 
clone did not have an amino acid substitution all through the coding sequence and was named as pRED908. 
[0077] Next, 1.5 kb fragment obtained by Kpn\- and PstV digestion of pRED908 was ligated to pQE30 (QIAGEN) 
digested by the same pairs of restriction enzymes, and transformed to £ coli KB822. As a result of restriction analysis, 

25 plasmid that had a correct structure (pRED1 002) was recovered. Then, competent £ coli M1 5 (pREP4) cells (QIAGEN) 
were transformed and one clone that had a correct structure was selected for further study. 

[0078] For an expression study, strain M15 (pREP4) (pRED1002) and vector control strain M15 (pREP4) (pQE30) 
were cultivated in 100 ml of LB medium at 30 °C until OD at 600 nm reached to 0.8 (about 5 hours) in the presence of 
25 ng/ml of kanamycin and 100 fig/ml of ampicillin. Then, the broth was divided into two portions of the same volume, 

30 and then 1 mM of IPTG was added to one portion. Cultivation continued for further 3.5 hours at 30 °C. Twenty five fil of 
the broth was removed from induced- and uninduced- culture of hmg clone and vector control cultures and subjected 
to SDS-PAGE analysis, it was confirmed that protein whose size was similar to deduced molecular weight from nucle- 
otide sequence (52.4 kDa) was expressed only in the case of clone that harbored pRED1002 with the induction. Cells 
from 50 ml broth were harvested by the centrifugation (1500 x g, 10 minutes), washed once and suspended in 2 ml of 

35 hmg buffer (100 mM potassium phosphate buffer (pH 7.0) containing 1 mM of EDTA and 10 mM of dithiothrertol). Cells 
were disrupted by French press (Ohtake Works) at 1500 kgf/cm 2 to yield a crude lysate. After the centrifugation of the 
crude lysate, a supernatant fraction was recovered and used as a crude extract for enzymatic analysis. In the only case 
of induced lysate of pRED1002 clone, a white pellet was spun down and was recovered. Enzyme assay for 3-hydroxy- 
3-methylglutaryl-CoA (HMG-CoA) reductase was performed by the photometric assay according to the method by 

40 Servouse et al. (Biochem. J. 240, 541-547, 1986). In all the crude extract, the activity of 3-hydroxy-3-methylglutaryl- 
CoA synthase was not detected. As a result of SDS-PAGE analysis for the crude extract, expressed protein band that 
had found in expressed broth was disappeared. Next, the white pellet that was recovered from the crude lysate of 
induced pRED1002 clone was solubilized with an equal volume of 20 % SDS, and then subjected to SDS-PAGE anal- 
ysis. An expressed protein was recovered in the white pellet, and this suggested that the expressed protein would form 

45 an inclusion body. 

[0079] Next, the expression experiment was performed in more mild condition. Cells were grown in LB medium at 28 
°C and the induction was performed by the addition of 0.1 mM of IPTG. Then, incubation was continued further for 3.5 
hours at 28 °C and then the cells were harvested. Preparation of the crude extract was the same as the previous pro- 
tocol. Results are summarized in TABLE 9. ft was shown that 30 times higher induction was observed, and this sug- 
50 gested that the cloned hmg gene codes HMG-CoA reductase. 
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TABLE 9 



Enzymatic characterization of hmg cDNA 
clone 


plasmid 


IPTGji mol of NADPH / 
minute / mg-protein 


pRED1002 




0.002 




+ 


0.059 


pQE30 




0 




+ 


0 



Example 10 Cloning of mevalonate kinase (mvk) gene 

[0080] A cloning protocol of mvk gene was almost the same as the hmc gene shown in Example 2 to 7. At first, the 
PCR primers whose sequence were shown in TABLE 10, based on the common sequences of mevalonate kinase 
genes from other species were synthesized. 

TABLE 10 

Sequence of primers used in the cloning of mvk gene 
Mk1 ; GCNCCNGGNAARGTNATHYTNTTYGGNGA (sense primer) (SEQ ID NO: 25) 
Mk2 ; CCCCANGTNSWNACNGCRTTRTCNACNCC (antisense primer) (SEQ ID NO: 26) 
(N=A, C, G or T; R=A or G, Y=C or T, H=A, T or C, S=C or G, W=A or T) 

[0081] After the PCR reaction of 25 cycles of 95 °C for 30 seconds. 46 °C for 30 seconds and 72°C for 15 seconds 
by using ExTaq as a DNA polymerase, the reaction mixture was applied to agarose gel electrophoresis. A 0.6 kb of PCR 
band whose length was expected to contain a partial mvk gene was recovered and purified by QIAquick according to 
the method indicated by the manufacturer and then ligated to pMOSBlue T-vector. After a transformation of competent 
E. coli DH5a cells, 4 white colonies were selected and plasmids were isolated. As a result of sequencing, it was found 
that one of the clones had a sequence whose deduced amino acid sequence was similar to known mevalonate kinase 
genes. This cDN A clone was named as pMK1 28 and used for further study. 

[0082] Next, a partial genomic clone which contained mvk gene was cloned by PCR. The PCR primers whose 
sequence were shown in TABLE 1 1 , based on the internal sequence of pMK128 were synthesized. 

TABLE 1 1 

Sequence of primers used in the cloning of genomic DNA containing 

mvk gene 

Mk5 ; ACATGCTGTAGTCCATG (sense primer) (SEQ ID NO: 27) 
Mk6 ; ACTCGGATTCCATGGA (antisense primer) (SEQ ID NOP: 28) 

[0083] PCR condition was 25 cycles of 30 seconds at 94 °C. 30 seconds at 55 °C and 1 minute at 72 °C. The amplified 
1 .4 kb fragment was cloned into pMOSBlue T-vector. As a result of sequencing, it was confirmed a genomic fragment 
containing mvk gene which had typical intron structures could be obtained and this genomic clone was named as 
pMK224. 

[0084] Southern Wot hybridization study was performed to clone a genomic fragment which contained an entire mvk 
gene from R rhodozyma. Probe was prepared by labeling a template DNA, pMK224 digested by Nco\ with DIG mul- 
tipriming method. Hybridization was performed with the method specified by the manufacturer. As a result, the labeled 
probe hybridized to a band that had 6.5 kb in its lengths. Next, a genomic library consisting of 5 to 7 kb of EcoR\ frag- 
ment was constructed in the XZAPII vector. The packaged extract was infected to E. coli XLIBlue, MRF strain (Strata- 
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gene) and over-laid with NZY medium poured onto LB agar medium. About 5000 plaques were screened by using 0.8 
kb fragment of A/col- digested pMK224 as a probe. Seven plaques were hybridized to the labeled probe. Then a phage 
lysate was prepared according to the method specified by the manufacturer (Stratagene) and in vivo excision was per- 
formed by using E coh XUBlue MRF and SOLR strains. Fourteen white colonies were selected and plasmids were 
isolated from those selected transformants. Then, isolated plasmids were digested by Nco\ and subjected to Southern 
Wot hybridization with the same probe as the plaque hybridization. The insert fragments of all the plasmids were hybrid- 
ized to the probe and this suggested that a genomic fragment containing mvk gene could be cloned. A plasmid from 
one of the positive clones was prepared and was named as pMK701 . About 3 kb of sequence was determined by the 
primer walking procedure and it was revealed that 5* end of the mvk gene wasn't included into pMK701 . 
[0085] Next a PCR primer which had a sequence ; 

TTGTTGTCGTAGCAGTGGGTGAGAG (SEQ ID NO: 29) was synthesized to clone the S'-adjacent genomic region of 
mvk gene with the Genome Walker Kit according to the method specified by manufacturer (Clontech). A specific 1 .4 kb 
PCR band was amplified and cloned into pMOSBlue T-vector. All of the transformants of DH5a selected had expected 
length of the insert. Subsequent sequencing revealed that 5-adjacent region of mvk gene could be cloned. One of the 
clone was designated as pMKEVR715 and used for further study. As a result of Southern blot hybridization using 
genomic DNA prepared in example 3, the labeled pMKEVR715 hybridized to 2.7 kb EcoR\ band. Then a genomic 
library in which EcoR\ fragments from 1 .4 to 3.0 kb in lengths were cloned into XZAPII was constructed and screened 
with 1 .0 kb of EcoR\ fragment from pMKEVR715. Fourteen positive plaques were selected from 5000 plaques and plas- 
mids were prepared from those plaques with in vivo excision procedure. 

[0086] The PCR primers whose sequences were shown in TABLE 12, taken from the internal sequence of 
pMKEVR715 were synthesized to select a positive clone with a colony PCR. 

TABLE 12 

PCR primers used for colony PCR to clone 5'-adjacent region of mvk gene 
Mk1 7 ; GGAAGAGGAAGAGAAAAG (sense primer) (SEQ ID NO: 30) 
Mk18 ; TTGC CG AACTC AATGTAG (antisense primer) (SEQ ID NO: 31) 



[0087] PCR condition was as follows; 25 cycles of 30 seconds at 94 °C, 30 seconds at 50 °C and 1 5 seconds at 72 
°C. From all the candidates except one clone, the positive 0.5 kb band was yielded. One of the clones was selected and 
named as pMK723 to determine the sequence of the upstream region of mvk gene. After sequencing of the 3'-region 
of pMK723 and combining with the sequence of pMK701, the genomic sequence of 4.8 kb fragment containing mvk 
gene was determined. The mvk gene consists of 4 introns and 5 exons (SEQ ID NO: 3). The deduced amino acid 
sequence except 4 amino acids in the amino terminal end (SEQ ID NO: 8) showed an extensive homology to known 
mevalonate kinase (44.3 % identity to mevalonate kinase from Rattus norvegicus). 

Example 1 1 Expression of mvk gene bv the introductio n of 1 base at amino terminal region 

[0088] Although the amino acid sequence showed a significant homology to known mevalonate kinase, an appropri- 
ate start codon for mvk gene could not be found. This result suggested the cloned gene might be a pseudogene for 
mevalonate kinase. To confirm this assumption, PCR primers whose sequences are shown in TABLE 13 were synthe- 
sized to introduce an artificial nucleotide which resulted in the generation of appropriate start codon at the amino termi- 
nal end. 



TABLE 13 

PCR primers used for the introduction of a nucleotide into mvk gene 
Mk33 ; GGATCCATGAGAGCCCAAAAAGAAGA (sense primer} (SEQ ID NO: 32) 
Mk34 ; GTCGACTCAAGCAAAAGACCAACGAC (antisense primer) (SEQ ID NO: 33) 



[0089] The artificial amino terminal sequence thus introduced were as follows; NH2-Met-Arg-Ala-Gln. After the PCR 
reaction of 25 cycles of 95 °C for 30 seconds, 55 °C for 30 seconds and 72 °C for 30 seconds by using ExTaq polymer- 
ase as a DNA polymerase. The reaction mixture was subjected to agarose gel electrophoresis. An expected 1 .4 kb of 
PCR band was amplified and cloned into pCR2.1 TOPO vector. After a transformation of competent E. coli TOP1 0 cells, 
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6 white colonies were selected and plasmids were isolated. As a result of sequencing, it was found that one clone had 
only one change of amino acid residue (Asp to Gly change at 81st amino acid residue in SEQ ID NO: 8). This plasmid 
was named as pMK1 130 #3334 and used for further study. Then, the insert fragment of pMK1 130 #3334 was cloned 
into pQE30. This plasmid was named as pMK1209 #3334. After the transformation of expression host, M15 (pREP4), 
expression study was conducted. M15 (pREP4) (pMK1209 #3334) strain and vector control strain (M15 (pREP4)- 
(pQE30)) were inoculated into 3 ml of LB medium containing 100 ng/ml of ampicillin. After the cultivation at 37 °C for 
3.75 hours, cultured broth were divided into two portions. 1 mM IPTG were added to one portion and an incubation was 
continued for 3 hours. Cells were harvested from 50 jxl of broth by the centrifugation and were subjected to SDS-PAGE 
analysis. Protein which had an expected molecular weight of 48.5 kDa was induced by the addition of IPTG in the cul- 
ture of M15 (pREP4) (pMK1209 #3334) though no induced protein band was observed in the vector control culture (Fig. 
2). This result suggested that activated form of the mevalonate kinase protein could be expressed by the artificial addi- 
tion of one nucleotide at amino terminal end. 

Example 12 Cloning of the mevalonate pyrophosphate decarboxylase (mod) gene 

[0090] A cloning protocol of mpd gene was almost the same as the hmc gene shown in Example 2 to 7. At first, the 
PCR primers whose sequence were shown in TABLE 14 based on the common sequences of mevalonate pyrophos- 
phate decarboxylase genes from other species were synthesized. 

TABLE 14 

Sequence of primers used in the cloning of mpd gene 
Mpd1 ; HTNAARTAYTTGGGNAARMGNGA (sense primer) (SEQ ID NO: 34) 
Mpd2 ; GCRTTNGGNCCNGCRTCRAANGTRTANGC (antisense primer) (SEQ ID NO: 35) 
(N=A, C, G or T; R=A or G, Y=C or T, H=A, T or C. M=A or C) 

[0091] After the PCR reaction of 25 cycles of 95 °C for 30 seconds, 50 °C for 30 seconds and 72°C for 15 seconds 
by using ExTaq as a DNA polymerase, reaction mixture was subjected to agarose gel electrophoresis. A 0.9 kb of PCR 
band whose length was expected to contain a partial mpd gene was recovered and purified by QIAquick according to 
the method prepared by the manufacturer and then ligated to pMOSBlue T-vector. After a transformation of competent 
E. coli DH5a cells, 6 white colonies were selected and plasmids were isolated. Two of 6 clones had an expected length 
of insert. As a result of sequencing, it was found that one of the clones had a sequence whose deduced amino acid 
sequence was similar to known mevalonate pyrophosphate decarboxylase genes. This cDNA clone was designated as 
pMPD1 29 and used for further study. 

[0092] Next, a partial genomic fragment which contained mpd gene was cloned by PCR. As a result of PCR whose 
condition was the same as that of the cloning of a partial cDNA fragment the amplified 1 .05 kb fragment was obtained 
and was cloned into pMOSBlue T-vector. As a result of sequencing, it was confirmed that a genomic fragment contain- 
ing mpd gene which had typical intron structures have been obtained and this genomic clone was named as pMPD220. 
[0093] Southern blot hybridization study was performed to clone a genomic fragment which contained the entire mpd 
gene from R rhodozyma. Probe was prepared by labeling a template DNA, pMPD220 digested by Kpn\, with DIG mul- 
tipriming method. Hybridization was performed with the method specified by the manufacturer. As a result, the probe 
hybridized to a band that had 7.5 kb in its lengths. Next, a genomic library consisting of from 6.5 to 9.0 kb of EcoR\ frag- 
ment in the XZAPII vector was constructed. The packaged extract was infected to £ coli XL1 Blue, MRF strain and over- 
laid with NZY medium poured onto LB agar medium. About 6000 plaques were screened by using the 0.6 kb fragment 
of Kpnl- digested pMPD220 as a probe. 4 plaques were hybridized to the labeled probe. Then a phage lysate was pre- 
pared according to the method specified by the manufacturer (Stratagene) and in vivo excision was performed by using 
E. coli XL1 Blue MRF and SOLR strains. Each 3 white colonies derived from 4 positive plaques were selected and plas- 
mids were isolated from those selected transformants. Then, isolated plasmids were subjected to a colony PCR method 
whose protocol was the same as that in example 8. PCR primers whose sequences were shown in TABLE 14, depend- 
ing on the sequence found in pMPD129 were synthesized and used for a colony PCR. 

TABLE 15 

Sequence of primers used in the colony PCR to clone a genomic mpd clone 
Mpd7 ; CCGAACTCTCGCTCATCGCC (sense primer) (SEQ ID NO: 36) 
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TABLE 15 (continued) 
Sequence of primers used in the colony PCR to clone a genomic mpd clone 
Mpd8 ; CAGATCAGCGCGTGGAGTGA (antisense primer) (SEQ ID NO: 37) 

[0094] PCR condition was almost the same as the cloning of mvk gene; 25 cycles of 30 seconds at 94 °C, 30 seconds 
at 50 °C and 10 seconds at 72 °C. All the clone except one produced a positive 0.2kb PCR band. A plasmid was pre- 
pared from one of the positive clones and the plasmid was named as pMPD701 and about 3 kb of sequence thereof 
was determined by the primer walking procedure (SEQ ID NO: 4). There existed an ORF consisted of 402 amino acids 
(SEQ ID NO: 9) whose sequence was similar to the sequences of known mevalonate pyrophosphate decarboxylase 
(52.3 % identity to mevalonate pyrophosphate decarboxylase from Schizosaccaromyces pombe). Also determined was 
a 0.4 kb of S'-adjacent region which was expected to include its promoter sequence. 

Example 13 Cloning of farn esvl pyrophosphate synthase i fos) dene 

[0095] A cloning protocol of fps gene was almost the same as the hmc gene shown in Example 2 to 7. At first, the 
PCR primers whose sequence were shown in TABLE 16 based on the common sequences of farnesyl pyrophosphate 
synthase genes from other species were synthesized. 

TABLE 16 

Sequence of primers used in the cloning of fps gene 
Fps1 ; CARGCNTAYTTYYTNGTNGCNGAYGA (sense primer) (SEQ ID NO: 38) 
Fps2 ; CAYTTRTTRTCYTGDATRTCNGTNCCDATYTT (antisense primer) (SEQ ID NO: 39) 
(N=A, C, G or T: R=A or G, Y=C or T, D=A, G or T) 

[0096] After the PCR reaction of 25 cycles of 95 °C for 30 seconds, 54 °C for 30 seconds and 72°C for 30 seconds 
by using ExTaq as a DN A polymerase, a reaction mixture was applied to agarose gel electrophoresis. A PCR band that 
has a desired length (0.5 kb) was recovered and purified by QIAquick according to the method prepared by the manu- 
facturer and then ligated to pUC57 vector. After a transformation of competent E. coli DH5a cells, 6 white colonies were 
selected and plasmids were then isolated. One of the plasmids which had desired length of an insert fragment was 
sequenced. As a result it was found that this clone had a sequence whose deduced amino acid sequence was similar 
to known farnesyl pyrophosphate synthase genes. This cDNA clone was named as pFPS107 and used for further study. 
[0097] Next, a genomic fragment was cloned by PCR by using the same primer set of Fps1 and Fps2. The same PCR 
condition as the case of cloning of a partial cDNA was used. A 1 .0 kb band yielded was cloned and sequenced. This 
clone contained the same sequence with the pFPSl07 and some typical intron fragments. This plasmid was named as 
pFPS1 13 and used for a further experiment 

[0098] Then, also cloned was a 5'- and 3'- adjacent region containing fps gene with the method described in Example 
8. At first, the PCR primers whose sequences were shown in TABLE 17 were synthsized. 

TABLE 17 

Sequences of primers used for a cloning of adjacent region of fps gene 
Fps7 ; ATCCTCATCCCGATGGGTGAATACT (sense for downstream cloning) (SEQ ID NO: 40) 
Fps9 ; AGGAGCGGTCAACAGATCGATGAGC (antisense for upstream cloning) (SEQ ID NO: 41) 

[0099] Amplified PCR bands were isolated and cloned into pMOSBlue T-vector. As a result of sequencing, it was 
found that the 5-adjacent region that had 2.5 kb in its length and 3*-adjacent region that had 2.0 kb in its length were 
cloned. These plasmids were named as pFPSSTu117 and pFPSSTd117, respectively After sequencing of both plas- 
mids, it was found that an ORF that consisted of 1068 basepairs with 8 introns. Deduced amino acid sequence showed 
an extensive homology to the known farnesyl pyrophosphate synthase from other species. Based on the sequence 
determined, two PCR primers were synthesized with the sequences shown in TABLE 17 to clone a genomic fps clone 
and cDNA done for fps gene expression in E. coli. 
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TABLE 18 

Sequences of primers used for a cDNA and genomic fps cloning 
5 Fps27 ; GAATTCATATGTCCACTACGCCTGA (sense primer) (SEQ ID NO: 42) 

Fps28 ; GTCGACGGTACCTATCACTCCCGCC (antisense primer) (SEQ ID NO: 43) 

io [01 00] PCR condition was as follows; 25 cycles of 30 seconds at 94 °C, 30 seconds at 50 °C and 30 seconds at 72 
°C. One cDNA clone that had a correct sequence was selected as a result of sequencing analysis of clones obtained 
by PCR and was named as pFPS1 13. Next, Southern blot hybridization study was performed to clone a genomic frag- 
ment which contained the entire fps gene from R rhodozyma. Probe was prepared by labeling a template DNA, 
pFPS! 13 with DIG multipriming method. As a result, labeled probe hybridized to a band that had about 10 kb in its 

is length. 

[0101] Next, a genomic library consisting of 9 to 15 kb of EcoR\ fragment was constructed in a XDASHII vector. The 
packaged extract was infected to E. coli XL1 Blue, MRA(P2) strain (Stratagene) and over-laid with NZY medium poured 
onto LB agar medium. About 10000 plaques were screened by using the 0.6 kb fragment of Sad- digested pFPS113 
as a probe. Eight plaques were hybridized to the labeled probe. Then a phage lysate was prepared according to the 

20 method specified by the manufacturer (Promega). All the plaques were subjected to a plaque PCR using Fps27 and 
Fps28 primers. Template DNA for a plaque PCR was prepared by heating 2 \l\ of solution of phage particles for 5 min- 
utes at 99 °C prior to a PCR reaction. PCR condition is the same as that of pFPS113 cloning hereinbefore. All the 
plaques gave a 2 kb of positive PCR band, and this suggested that these clones had an entire region containing fps 
gene. One of the XDNA that harbored fps gene was digested with EcoR\ to isolate 10 kb of EcoR\ fragment and to clone 

25 in Ecofil-digested and ClAP-treated pBluescriptll KS- (Stratagene). Twelve white colonies from transformed E. coli 
DH5a cells were selected and plasmids were prepared from these clones and subjected to colony PCR by using the 
same primer sets of Fps27 and Fps28 and the same PCR condition. Two kb of positive band were yielded from 3 of 12 
candidates. One clone was cloned and named as pFPS603. It was confirmed that sequence of fps gene which was pre- 
viously determined from the sequence of pFPSSTu1 1 7 and pFPSStdl 1 7 were almost correct although they had some 

30 PCR errors. Finally, it was determined the nucleotide sequence of 4092 base pairs which contains fps gene from P. 
rhodozyma (Fig. 3), and an ORF which consisted of 365 amino acids with 8 introns was found (SEQ ID NO: 5). 
Deduced amino acid sequence (SEQ ID NO: 10) showed an extensive homology to known FPP synthase (65 % identity 
to FPP synthase from Kluyveromyces lactis). 

35 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT; F . HOFFMANN- LA ROCHE. AG 

(ii) TITLE OF INVENTION: Improvement of microbiological 

carotenoid production and biological materials therefor 

(iii) NUMBER OF SEQUENCES: 43 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: 

(B) STREET: Grezacherstrasse 124 

(C) CITY: BASLE 

(E) COUNTRY: SWITZERLAND 

(F) 2IP: CH-4002 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 061-688 25 11 

(B) TELEFAX: 061-688 13 95 

(C) TELEX: 962292/965542 hlr c 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6370 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1441.. 1466 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1467.. 1722 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1723.. 1813 
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(ix) FEATURE: 

(A) NAME/KEY: intron 

{B> LOCATION: 1814.. 1914 

(ix) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION: 1915.. 2535 

(ix> FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2536.. 2621 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2622.. 2867 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2868.. 2942 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2943.. 3897 

(ix) FEATURE: 

(A) NAME/ KEY: intron 

(B) LOCATION: 3898.. 4030 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 4031.. 4516 

(ix) FEATURE: 

(A) NAME/ KEY: intron 

(B) LOCATION: 4517.. 4616 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 4617.. 4909 

(ix) FEATURE: 

(A) NAME/ KEY: intron 

(B) LOCATION: 4910.. 5007 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 5008.. 5081 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 5082.. 5195 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 5196.. 5446 

(ix) FEATURE: 

(A) NAME/ KEY: intron 

(B) LOCATION: 5447.. 5523 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 5524.. 5756 

(ix) FEATURE: 

(A) NAME/KEY: polyA_site 
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(B) LOCATION: 6173 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

5 GGAAGACATG ATGGTGTGGG TGTGAGTATG AGCGTGAGCG TGGGTATGGG CCTGGGTGTG 60 

GGTATGAGCG GTGGTGGTGA TGGATGGATG GGTGGGTGGC GTGGAGGGGT CCGTGCGGCA 120 

AGATGTTTTC TCTGGGTAGG AGCGTTCTGC ATTGGGGCAG GAGAAAAAAT AGTGTGGTTA 180 

TO CGGGAGATCG TGGTTACATC AAGCCATCGT CACTGTAAGG CTCTGTAAGG CTCGGTTGTT 240 

AAGAAGGTAA CCAAGTGTAA TCACTTGGTT CGCGGGGTGA CACTTAGGCT CTGGCGATTA 300 

ATATATCTGA AGCAGACCAA ACTATTAACA ATATACTTTT GGATAAGAGG TTTCAACAAG 360 

15 AATCTCAGCT TGAGGAAAAC TCTTATCCAA GAAGGCGCGA GGGCGTCCCC GTTTTATATC 420 

AGGACCCCTC GCGCATTTGG TCTGCCACTA AAGATATACA TATGACGAGC CTAGAGAGGC 480 

TCGAGATCAC GAAAACTAAA AAGATGAAGC ATGAACCATG CAAACTAGAG CATGATGGAA 540 

AATGGGCGAA GAGGCATAAG GGATGGAGGG AACGAATAGC CTGTAGGGGT AACCCACGTA 600 

20 

AGAGAACACG TGATACTTAA CCCGTATCCC TGACAGTCAC GGTGTTTCTT GAGAGTCAGT 660 

AATGTCCAGC TGTGACCTCA CGTGACTAAA CCCGACACGT GTGCTTCGAC CGAGGTGGGA 720 

CGATCTTTTT TTTGGGGGGA GAAACCGAGT GGGACGATAG AGAGGACTAC GGAGAACTGT 780 

25 

AGTGAATTGT AGTGCGCTCA CTACGGAGAG TTCTAGTTGA GCAAGCGATG TGATTTTCAA 840 

TACAATCCCG GACTACAAGC TCTCTAATAG AGCTCTATAA TAGAAGGACA AAAGTCGTCC 900 

CACTCCTATC TCCCGCGCGT TTTAATAGAG ACCGATTGTT TTTTTCCCTA ATGTTTTATT 960 

30 TTCTTTCCCC GATCGGCTCA TTTTTCTTCT CTCCGCGTAT TCTTCACACA ACGCTCCCTC 1020 

CGATCTTTTT TCTTCTTGTT CCTGTTCCTC TTCGTCTCCT TCCATTGTCT TCTTTCCTTC 1080 

CTTCCTTCCT TCTTGCCTCT AGCCAGCTTC AACAGCGACG TCTCTCTCTC TCTGTGTGGT 1140 

35 GATCTCCGAC TGTAGTGTCT CTCTCGGTCA CTTTCACGAA TCAACTTCGT TTCTTTTCTG 1200 

ATCGATCGGT CGTCTTTCCC TCAATCCGTG CATACACTCA CACTTACACT CACACCCACA 1260 

CACTCAAACA CGCTAAATAA TCAGATCCGT CTCCCCTTCT TGATCTCCTT CGGCTTAGGC 1320 

40 AATGGCTTCC TTGTTCGGCC TCCGGCGGTC CTCAAACGAG CAGCCGCGCT CTCCTCTGCT 1380 

CATCCAATCG AAGTCATCCT TTCTACCTTT GTCGTGGTCA CCTTGACGTA CTTTCAGTTG 1440 

ATGTACACCA TCAAGCACAG TAATTTGTAC GTCCGATCAT CTATTTGTCG TGTTCTCCTT 1500 

AGTCTCTTTC TCTTCCTCCT TTGTCTTTCG CGTCAGCGTG GCTGGATTTC CGTCTCCATG 1560 

45 

TCATTTCCCT TATTTCCTCT TCCTGTCATT TGTTCCTCTA CTTTTCTTTC TCTACCTCCT 1620 

TTCCCTGTCG TTTGCTTTCC TTCGCCAGTT GACCACCGAT CCTCAGGATT CATGGCTAAC 1680 

ATGCCCAACA CAAACTTGCA TATCATCTCT CTTCGTCCAC AGTCTTTCTC AGACGATTAG 1740 

50 

CACACAATCT ACCACCAGCT GGGTCGTCGA TGCGTTCTTC TCTTTGGGAT CCAGATACCT 1800 

TGACCTCGCG AAGGTTAGTC AGTTGACCCT CTCATGCTTC TTTTCTCTCA GTCTTGTGTG 1860 
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TGCGCATATA CCCACTCATA GACATCTTCG TACGCTGCAC TTTCCCTCCC TTAGCAAGCA 1920 

GACTCGGCCG ATATCTTTAT GGTCCTCCTC GGTTACGTCC TTATGCACGG CACATTCGTC 1980 

5 CGACTGTTCC TCAACTTTCG TCGGATGGGC GCAAACTTTT GGCTGCCAGG CATGGTTCTT 2040 

GTCTCGTCCT CCTTTGCCTT CCTCACCGCC CTCCTCGCCG CCTCGATCCT CAACGTTCCG 2100 

ATCGACCCGA TCTGTCTCTC GGAAGCACTT CCCTTCCTCG TGCTCACCGT CGGATTTGAC 2160 

10 AAGGACTTTA CCCTCGCAAA ATCTGTGTTC AGCTCCCCAG AAATCGCACC CGTCATGCTT 2220 

AGACGAAAGC CGGTGATCCA ACCAGGAGAT GACGACGATC TCGAACAGGA CGAGCACAGC 2280 

AGAGTGGCCG CCAACAAGGT TGACATTCAG TGGGCCCCTC CGGTCGCCGC CTCCCGTATC 2340 

GTCATTGGCT CGGTCGAGAA GATCGGGTCC TCGATCGTCA GAGACTTTGC CCTCGAGGTC 2400 

GCCGTCCTCC TTCTCGGAGC CGCCAGCGGG CTCGGCGGAC TCAAGGAGTT TTGTAAGCTC 2460 

GCCGCGTTAA TTTTGGTGGC CGACTGCTGC TTCACCTTTA CCTTCTATGT CGCCATCCTC 2520 

ACCGTCATGG TCGAGGTAAG CCTTTTCTTC AAGTTTCTTG CTGTCATTTT CCTTTCGACA 2580 

CGTATGCTCA TCTTTCGTTT CCGTCTCTCT CACCTTTCCA GGTTCACCGA ATCAAGATCA 2640 

TCCGGGGCTT CCGACCGGCC CACAATAACC GAACACCGAA TACTGTGCCC TCTACCCCTA 2700 

CTATCGACGG TCAATCTACC AACAGATCCG GCATCTCGTC AGGGCCTCCG GCCCGACCGA 2760 

25 CCGTGCCCGT GTGGAAGAAA GTCTGGAGGA AGCTCATGGG CCCAGAGATC GATTGGGCGT 2820 

CCGAAGCTGA GGCTCGAAAC CCGGTTCCAA AGTTGAAGTT GCTCTTAGTA AGTAAACTTC 2880 

CTTTGTTCTT CTCATCATTC TTTATCTCCG AATCCTGACG TCGGACCCTT CTCGATTCAA 2940 

30 AGATCTTGGC CTTTCTTATC CTTCATATCC TCAACCTTTG CACGCCTCTG ACCGAGACCA 3000 

CAGCTATCAA GCGATCGTCT AGCATACACC AGCCCATTTA TGCCGACCCT GCTCATCCGA 3060 

TCGCACAGAC AAACACGACG CTCCATCGGG CGCACAGCCT AGTCATCTTT GATCAGTTCC 3120 

TTAGTGACTG GACGACCATC GTCGGAGATC CAATCATGAG CAAGTGGATC ATCATCACCC 3180 

TGGGCGTGTC CATCCTGCTG AACGGGTTCC TCCTAAAAGG GATCGCTTCT GGCTCTGCTC 3240 

TCGGACCCGG TCGTGCCGGA GGAGGAGGAG CTGCCGCCGC CGCCGCCGTC TTGCTCGGAG 3300 

CGTGGGAAAT CGTCGATTGG AACAATGAGA CAGAGACCTC AACGAACACT CCGGCTGGTC 3360 

CACCCGGCCA CAAGAACCAG AATGTCAACC TCCGACTCAG TCTCGAGCGG GATACTGGTC 3420 

TCCTCCGTTA CCAGCGTGAG CAGGCCTACC AGGCCCAGTC TCAGATCCTC GCTCCTATTT 3480 

CACCGGTCTC TGTCGCGCCC GTCGTCTCCA ACGGTAACGG TAACGCATCG AAATCGATTG 3540 

45 AGAAACCAAT GCCTCGTTTG GTGGTCCCTA ACGGACCAAG ATCCTTGCCT GAATCACCAC 3600 

CTTCGACGAC AGAATCAACC CCGGTCAACA AGGTTATCAT CGGTGGACCG TCCGACAGGC 3660 

CTGCCCTAGA CGGACTCGCC AATGGAAACG GTGCCGTCCC CCTTGACAAA CAAACTGTGC 3720 
50 TTGGCATGAG GTCGATCGAA GAATGCGAAG AAATTATGAA GAGTGGTCTC GGGCCTTACT 3780 
CACTCAACGA CGAAGAATTG ATTTTGTTGA CTCAAAAGGG AAAGATTCCG CCGTACTCGC 3840 
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TGGAAAAAGC 
GGTCTTTTTC 
ACAGGTTGTG 
TCGACGTCAG 
GATTACGACT 
CTCCCTGTCG 
GCCACCACCG 
GGTGGCGGAG 
TTCCCTTCGG 
GAGGTTATGG 
TGTGGAATGG 
GGAATGAACA 
TTTCTCATCC 
AGGAACGGAG 
TCTTTCTGGT 
TGGAAAGTCC 
GACAACGGTT 
GGCAGGCAGC 
TCTTTCCATA 
TTCTTCTTTT 
CAGAATGTGG 
TTCCTTCTTT 
CTTTATACGG 
ATCACCTGCT 
CCGCAAAACG 
CACAATGCTC 
TTGATGAGTG 
TTTTGATAAT 
CAGAGCACAA 
CGAACACGCC 
TCGCCTCGAT 
GCATGGCTTG 
GGGACTCCCT 



ATTGCAGAAC 
TCCTTTGAAT 
CAAACAAACC 
CCCGAGCATC 
ACTCGAAAGT 
GAATCGCTGG 
AGGGAACTCT 
TGACCACCGT 
TCTCTCAGGC 
CCGCTTCGTT 
CCGGCCGATC 
TGGCTGGTGA 
AGCCAATTCA 
AAAGCTTTGG 
AACTACTGTA 
GTGGTGGCCG 
GCGGATCTCG 
ATTGGAGGAT 
TTCGTCCTCG 
CACCCACACA 
AGTCCTCAAT 
TTCCATATGT 
TCCTTCTCTT 
CGATGCCGGC 
CCTGTTTGCA 
GTCGACTAGC 
CTTTGGCCGC 
CAAAAGGGTC 
TCGATCGACA 
GTCCCACCGG 
GTTCTCTGGG 
CGTCAGGGAA 
TTTCTTGGTA 



TGTGAGCGGG 
TTCAAGCCTT 
AAGAGAAACT 
CGTTACTAAG 
GATGGGCGCA 
TCCACTTAAC 
CGTGGCCTCG 
CATCACCCAG 
CGCACAGGCC 
CAACTCGACT 
GCTATACATC 
GTGCGACGAG 
TTCTTCATTC 
AAACCCTGTC 
TCGACAAGAA 
AGTCGGTGAT 
TCAACTTGAA 
TCAACGCCCA 
TTTAATTTCT 
CATACAGTCA 
GTGCATGACA 
TTCTACTTCT 
TCTCATGACG 
GATCGAGTGC 
GATGCTCGGT 
AAGAATCATC 
TGGTCATTTA 
GTGGTACTGG 
CCTTCGACTC 
TCGATTGGAT 
TTCGGTAGTC 
CGAGGGGACG 
TCCCTTCCGT 



CGGTCAAGAT 
GGAGGAGAGG 
AAAGAAAACT 
ACGCTGGAAA 
TGCTGTGAGA 
ATTGATGGCG 
ACGTCGAGAG 
GATGCGATGA 
AAACGATGGT 
TCTAGATTCG 
CGTTTGGCGA 
TTTTCTTTGT 
CTTCTCGGTG 
CGAGTACTTC 
GCCTTCTGCC 
CCCTGGAGCG 
CATTAAGAAA 
CGCGTCGAAT 
TTTCTGTCCA 
ATCTTCTTGG 
TTGATGGAGG 
ACTTTCTTCC 
AGTAGTGTGA 
GGAACGGTCG 
GTCGCAGGTG 
GCTGCCAGTG 
ATCAAGGCCC 
TGTCACTGAC 
CTCTACCGGT 
TGCTCACACC 
CGTCGACGAG 
AGACGAGTGT 
TTTTCTTTCG 



TCGAAGGGCG 
AAAGTGCTTC 
TTCTTCTCCT 
CCTCGGACTT 
ACGTTGTCGG 
AGGTCGTCCC 
GTTGCAAAGC 
CGAGAGGACC 
TGGATTCGGT 
CCAGGTTGCA 
CCAGTACCGG 
TCTTCTTGTG 
TTTGGCAACC 
CCATCCATGC 
ATCAACTGGA 
ATCGTCAAGT 
AACTTGATCG 
ATTTTGACTG 
GTCTTATGAC 
CTACAGGTCA 
CGTACGTTTT 
CGAGTCCGCC 
ACGACGGAAA 
GTGGAGGAAC 
CCCATCCAGA 
TGATGGCTGG 
ACATGAGTAA 
TGGTGACTCT 
CTCACCGTTG 
GATGACGTCT 
CTCGCTCAAG 
GAACGTGGAT 
GCCTTTGAAT 



GTTATCTGTA 
GGGGTACAAT 
CTCTCTCCCC 
GCCCATGAAG 
ATATATGCCT 
CATCCCGATG 
GCTCAACGCG 
GGTGGTGGAT 
CGAAGGAATG 
GAGCATCAAG 
AGATGCGATG 
CGGACCATGT 
TTTTAGGTAA 
AGATCCTTGC 
TTGAGGGCCG 
CTGTCCTCAA 
GAAGTGCCAT 
TGCGTACTTC 
GTCTGATTGG 
GGATCCTGCA 
TTGTTTTGTT 
AAGCTGATAC 
AGATCTACTC 
TTTCCTCCCT 
TTCGCCCGGT 
AGAGTTGAGT 
GTCTGCCACC 
TCCTGTCATG 
GCGACCCGAC 
TCCGCATCGG 
ACGGTAGGTA 
GCCTGAACTG 
CCTGTATTCT 



3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4660 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 
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TGTCCGTTTT 


TTCATCTTCT 


CTTCCTGGTT 


CTCCTTCTCT CGTTCATCTG 


CAAAAACAAA 


5880 


ATTCAATCGC 


ATCGGTCTCT 


GGCATTCCAT 


TTGGGTTTCA AAATCAAATC 


AATCTCTATC 


5940 


TACTATCTCA 


AATATCTTTT 


TTTCATCTTT 


TGATTCATTT CTGTTGAAAA 


CTGTCTTGCC 


6000 


CTTCTCCTAC 


TTCTTATCTC 


TGCCTTCTTG 


CCAAAGTTCA ATTCGTTGTC 


CATCTGTGCA 


6060 


CTCTGATCTA 


TCAGTCTGTA 


TCAAGTACGC 


TCTTAAATCT GTAATTGGCT CTCGGAGGTG 


6120 


TCTCGTCATC 


TCACATATGG 


CTGGCGATAT 


GATGTGTCGG TTTCTTCCCC 


TCCAACAAAG 


6180 


GCGACGTGGC 


TCCTTCATCA 


ATCTTTGGCG 


CAAGCTCTCA AAATTCTCCA AAACGGCTGA 


6240 


CTAAGCAAGG 


TTTCCAAGTA 


CTCTCAAACC 


GAGCAAGGCC ATCCATCCTC 


AAATCAACTT 


6300 


GTGAAACCCT 


TTGTGGATAG 


ACCGTCCAAA 


CCGAGCTCTT CCCAATCTTC 


GCCTCCCCTT 


6360 


CTTCCTGCAG 










6370 



(2) INFORMATION FOR SEQ ID NO: 2: 

2 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4775 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 
3Q (A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(XX) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1305.. 1361 

35 (ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1362.. 1504 

(ix) FEATURE: 

(A) NAME/KEY: exon 
40 (B) LOCATION: 1505.. 1522 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1523.. 1699 

(ix) FEATURE: 
45 (A) NAME/KEY: exon 

(B) LOCATION: 1700.. 1826 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1827.. 1920 



(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1921.. 2277 
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(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2278.. 2351 

(iX) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2352.. 2409 

(iX) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2410.. 2497 

(iX) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2498.. 2504 

(iX) FEATURE: 

(A) NAME/ KEY: intron 

(B) LOCATION: 2505.. 2586 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2587.. 2768 

(ix) FEATURE: 

(A) NAME/ KEY: intron 

(B) LOCATION: 2769.. 2851 

(iX) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2852.. 2891 

(iX) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2892.. 2985 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2986.. 3240 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 3241.. 3325 

(ix) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION: 3326.. 3493 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 3494.. 3601 

(iX) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION: 3602.. 3768 

(iX) FEATURE: 

(A) NAME/KEY: polyA_site 

(B) LOCATION: 4043 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
CATCGAAGAG AGCGAAGTGA TTAGGGAAGC CGAAGAGGCA CTAACAACGT GGTTGTATAT 
GTGTGTTTAT GAGTGTTATA TCGTCAA6AA CGAAGTCCAT TCATTTAGCT AGACAGGGAG 
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AGAGGGAGAA ACGTACGGGT TTACCCTATT 
TGGGTCGGTC ACCTGAAGAG TTTGAACCTC 

5 

TATGTGAAGG ATAATGTCAA ACTTTGTCCA 
AACGAGAGTA TCGTCCCATC TATGGGTGAC 
AATGGAAGGT TCCGATGGAT CAGAAGTAGG 

10 GTGAGATACA TATGCAGACT GATATGCTAG 

TCAAAAAGAC GAACCAACCA TTTCATGTCC 
TCCTCCCTGA TGCGGACAGA AAAGAATAAA 

15 AGTTTATCAA GCTGAGCGAG AAAACTCGAA 

CCAAGCATCG GCCTTTCCTC TTTGCGGCAG 
CGTCTCCTGA CCCGTTGCTT TCCTTGACAG 
AGGTCCCAAC AGCAAAGATA TCTGGATGTC 

20 

ACACCGTATC GATATAGGCG AGTGAGGAAG 
ATCAGCGAAT GAGGACTATG ACAAAAAAGA 
CTTCCATCGT GTCCTCCAAG AGGGTTTCGT 

25 

GATTGACCTT GAGTACGCGG ATGGACAAGG 
CACACGTGAC TCCACTTGAA TTGCGGCAGA 
TAACCATTTA GTTATTTACC CGTCTTGTTT 

30 AAAAAAGAAG CCAGAAGAGA AAAGAATAAA 

CACACCCACA AAACCATACA CAATCTCAAT 
CCGAACAGCG ACCCAAAGAT GTTGGAATTC 

35 TTGTGTTCAA TCTTTAATCA TCTTTAGTCG 

TCAAACAAAA CAACCCTTCT CGATTCATGT 
GTAGATCTAC TTTCCTCGAC GAGTGCGTAA 

4Q CCCATGTTCG ATCCCTCGCC CTCATATGGG 

TTCTTCTTTG ATCTTGTTCA TTTTCTACTA 
CGCGATTTCT CTCGATCAGG CCATCGCTCA 
TTCCGGAAAG TACACCATCG GTCTCGGCAA 

45 

GGACATCAAC TCGTTCGCCT TGAACGGTCA 
AAGGCCCAAG CGCATCTCAC TGACACCTTT 
CTGTTTCCGG TCTTCTATCA AAGTACAACG 

50 

TCGGAACTGA GTCCATCATT GACAAGTCCA 
TCGAGTCCCA CGGCAACACA GATATTGAGG 



GGACCAGTCT AAAGAGAGAA CGAGAGTTTT 180 

CACAAGTTTA TTCTAGATTA TTTCCGGGGG 240 

GATTGAAGAA GGCAAGAAAG GAAAGGGGCG 300 

CAGTCGACCT TCTGCATCGG CGATCCCGAG 360 

TTTCCTAAGC TCAAACATAG GTCATTGCGA 420 

TCAAACCGAA CGAGATTTCT CTGTTTGCTT 480 

AAGATGGCAG GTCCTTCGAT TCTTTGAAGC 540 

AAGTAGACAG ACTGTCAAGT CGACAGCGCA 600 

CTTACATACC TTGGCCGTCA GTTCTGTAGA 660 

GTGTACGCGT TGGCTCACCA TCGTCACTCT 720 

CAGTCTGTTC CACAGGTTTC TCTAACTGAT 780 

TATGTGAGAA CTCTACTGAG TCGGCAGAGT 840 

CTTTGAAAGG TGAAGAAGTA GCGAAAGATC 900 

AATTTTCGTA TAATCCACTG GACAAATCAC 960 

CTGAAACGTA AGGACGAGGT ATTGATAGAT 1020 

AACGAGCCCA CTCCCAGGGC TATGTAACAC 1080 

TAAACGAAGT CTTACGATCG GACGACTTTG 1140 

TCTTACTTTG ATCGTCCCAT TTTAGACACA 1200 

ACGTCTACCG TGTTCTCTCC GAATTCTTAC 1260 

CTAGATATCC AGTTATGTAC ACTTCTACTA 1320 

TCGGTATGGA GGTATGTTGT TCAATTCTGT 1380 

ACTGACCGGT TCTTCCTTTT TTTTTCTTCA 1440 

CATCTTTCTT TCCAATGCGC TACTCCTTCT 1500 

CTATTCTCTC TTCTGCATTC TCTCTCTATT 1560 

CGACTGTTTC ATCTCTTTTG CTTCCGTCCA 1620 

ATATCTCCCG ACGCGAAATA CAACACTGAC 1680 

CAAGGATCTC GAGGCTTTTG ATGGGGTTCC 1740 

CAACTTCATG GCCTTCACCG ACGACACTGA 1800 

GTCTCTTCCG TTTCAGCAAT CGACAGGAAA 1860 

CTCCGTTTTG CAATTCCATT TGATTGTTAG 1920 

TTGATCCCAA GTCAATCGGT CGAATTGATG 1980 

AATCTGTCAA GACAGTCCTT ATGGACTTGT 2040 

GTATCGACTC CAAGAATGCC TGCTACGGTT 2100 
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CTACCGCGGC CCTGTTCAAT GCCGTCAACT GGATCGAGTC ATCCTCTTGG GACGGAAGAA 
ATGCCATTGT CTTCTGCGGA GACATTGCCA TCTACGCCGA GGGTGCTGCC CGACCTGCCG 
GAGGTGCTGG TGCTTGCGCC ATCCTCATCG GACCCGACGC TCCCGTCGTC TTCGAGCGTG 
AGTTCCAATC CGTCATTTTC TTCCACGGCA GCGGCTGAAA C AACC CTTAT CCGTCATTCT 
CATCAATCTA GCCGTCCACG GAAACTTCAT GACCAACGCT TGGGACTTCT ACAAGCCTAA 
TCTTTCTTCG TATGTTCAAA TTTTGAAGTT TGCGCTTGGG AGAGTCTTAC ACTAATTCGG 
GGTGCTCGTA TCCTTCGAAT CGTTTGTTGC TTTATAGTGA ATACGTTCGT CTGCGCACCT 
CCTATATTTA GTTTTTGATC AAATATTGTC CATTGAATTA ACTCTGAAAC CTTCTCCTCC 
AAATAGCCCA TTGTCGATGG ACCTCTCTCC GTCACTTCCT ACGTCAACGC CATTGACAAG 
GCCTATGAAG CTTACCGAAC AAAGTATGCC AAGCGATTTG GAGGACCCAA GACTAACGGT 
GTCACCAACG GACACACCGA GGTTGCCGGT GTCAGTGCTG CGTCGTTCGA TTACCTTTTG 
TTCCACAGGT AAGCGTCATC TTCTGTATTC TCCTTAAATT CAACCGATCA ACGGAGTTAA 
TTCGTGTCAT CATATTATCT TGTTGGAACA GTCCTTACGG AAAGCAGGTT GTCAAAGGCC 
ACGGCCGACT TGTAAGCAGT CTTTTTGTAA CTCTTAGCTT GCAGATAAAA ACTTTTAGGT 
TTCTGGTACT CATTATTTAT GCATCTCTTG AATCACCTTA TCTAGTTGTA CAATGACTTC 
CGAAACAACC CCAACGACCC GGTTTTTGCT GAGGTGCCAG CCGAGCTTGC TACTTTGGAC 
ATGAAGAAAA GTCTTTCAGA CAAGAATGTC GAGAAATCTC TGATTGCTGC CTCCAAGTCT 
TCTTTCAACA AGCAGGTTGA GCCTGGAATG ACCACCGTCC GACAGCTCGG AAACTTGTAC 
ACCGCCTCTC TCTTCGGTGC TCTCGCAAGT TTGTTCTCTA ATGTTCCTGG TGACGAGCTC 
GTAAGTCTTG ATCTCTATCC CAATCATCTC TTCCTTATCA ATTGAACTGA ACTCTTTTCT 
TTAATGCTGG CTTTCTCTTG AACAGGTCGG CAAGCGCATT GCTCTCTACG CCTACGGATC 
TGGAGCTGCT GCTTCTTTCT ATGCTCTTAA GGTCAAGAGC TCAACCGCTT TCATCTCTGA 
GAAGCTTGAT CTCAACAACC GATTGAGCAA CATGAAGATT GTCCCCTGTG ATGACTTTGT 
CAAAGCTCTG AAGGTACGTT GGATAATGAC TTTTTTTGTG GACCGTGGTC TTTGTCAACC 
GCTAACAACC TTCTTGAATC GGTCTCTTTT GGTTTGAAAT TCGCTCGGCG CTTCGACACA 
GGTCCGAGAA GAGACTCACA ACGCCGTGTC ATATTCGCCC ATCGGTTCGC TTGACGATCT 
CTGGCCTGGA TCGTACTACT TGGGAGAGAT TGACAGCATG TGGCGTCGAC AGTACAAGCA 
GGTCCCTTCT GCTTGAACGG GATATTAAAA GTTTCAAAAG TTATGAAAGA GGTCGGCGAA 
GATTCAAAAT AAATAAATAT AACACCTTGC TTTTTGGCTT GTTTTCCTTC TTCACTCTCG 
TTTCCGATGT GTTTCCTCCG TTTCTTCCCT CTTTTGTTCC TTTTTCCTCC CTCTTTTGGT 
TACAATCTCT TTGGGTTTTA CAGGCTGGCA ATCTCTGTAC AATCTTCGTT CGCGTGATCC 
GACATAGATA CCGTTGTGGC ATACACCTTG CGTCTTACAT CTTTTGAGAG CTTCGGAGGT 
GATCTTGATG AAGAAAATTC ACCATTGACT CCCATCTCTT GAATGTCCTG ACTAAATTGA 



2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 
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ATTGGAAGCA ACTTATATGA AGAGCAAATT GATGGATCCA GAAAGGAACA AGTCTAGAAA 4140 

TCAGTGATTT GTGCGAAAAA TCAGCAAATG CCGCGCTGAG CCGCTCGCTG GGGAGTAGAC 4200 

5 ATTGCCCATG CGCGTGATGT TGTCTGACCG TTCTCCTCCA TTCCCCCACT CTCAACCTTC 4260 

CTCTCTTTGA GAATCGAAGA AGAAGGCGAA GAAAACCTGA CTTGATCCTT TACAGGGTGT 4320 

TTCTTTTGTT CGTATCTGAG TTACTTTTCC TCCTTTCCTT CCTGCTTGAG TGAATGACTG 4380 

10 ATCTGACTCC TCCGCCTACC TCGGCGACTG GGCTATATCT TGAGGATAGA ATATCCCCCT 4440 

GACAATCCCA TTTCTCAAGA TTCTTTCAAA CAAGAAAACT AGTTCCAATC AATAGATCAT 4500 

CTGATCAACC TTGTGTGAAC ATAATCATCT GCAGAAGCAC TGAACTGAGA AAGTCTTCCT 4560 

75 CAGAGGAAAG AGAATACTAG ATAAGATCAT TCGGTTGGGA AGGTAAAGGA ATGAAGTCTG 4620 

GTTCTGGGTT TAGCTCTGGT TCCGTAGGGG GTTCGACTAT AGTTTCTTCT GTTCGACTAG 4680 

AAACAGGAGA AACCGTACAT GTAAATGGTA TGATATTCTT GTCTCTGTAT CATGTCCCGC 4740 

20 TCATCTCTTT GTTTGCAAGT CACTCTGGAG AATTC 4775 

(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4135 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
30 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

35 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1021.. 1124 

(ix) FEATURE: 
40 (A) NAME/KEY: intron 

(B) LOCATION: 1125.. 1630 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1631.. 1956 

45 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1957.. 2051 

(ix) FEATURE: 

(A) NAME/KEY: exon 
50 (B) LOCATION: 2052.. 2366 

(ix) FEATURE: 

(A) NAME/KEY: intron 
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<B) LOCATION: 2367.. 2446 

<ix) FEATURE: 

(A) NAME/ KEY: exon 
5 (B) LOCATION: 2447.. 2651 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2652.-2732 

70 (ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2733.. 3188 

<ix) FEATURE: 

(A) NAME/KEY: polyA_site 

(B) LOCATION: 3284 

75 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ACTGACTCGG CTACCGGAAA ATATCTTTTC AGGACGCCTT GATCGTTTTG GACAACACCA 60 

TGATGTCACC ATATCTTCAG CGGCCGTTGG AGCTAGGAGT AGACATTGTA TACGACTCTG 120 

GAACAAAGTA TTTGAGTGGA CACCACGATC TCATGGCTGG TGTGATTACT ACTCGTACTG 180 

AGGAGATTGG GAAGGTTCGT GCTTGCTTGC TTTGAATGTC GTGCCTAAAG CCATTGCCAT 240 

AAGACAGAGT CTGATCTATG TCGTTTGCCT ACAACAGAGA ATGGCCTGGT TCCCAAATGC 300 

TATGGGAAAT GCATTGTCTC CGTTCGACTC GTTCCTTCTT CTCCGAGGAC TCAAAACACT 360 

TCCTCTCCGA CTGGACAAGC AGCAGGCCTC ATCTCACCTG ATCGCCTCGT ACTTACACAC 420 

CCTCGGCTTT CTTGTTCACT ACCCCGGTCT GCCTTCTGAC CCTGGGTACG AACTTCATAA 480 

CTCTCAGGCG AGTGGTGCAG GTGCCGTCAT GAGCTTTGAG ACCGGAGATA TCGCGTTGAG 540 

TGAGGCCATC GTGGGCGGAA CCCGAGTTTG GGGAATCAGT GTCAGTTTCG GAGCCGTGAA 600 

CAGTTTGATC AGCATGCCTT GTCTAATGAG GTTAGTTCTT ATGCCTTCTT TTCGCGCCTT 660 

CTAAAATTTC TGGCTGACTA ATTGGGTCGG TCTTTCCGTT CTTGCATTTC AGTCACGCAT 720 

CTATTCCTGC TCACCTTCGA GCCGAGCGAG GTCTCCCCGA ACATCTGATT CGACTGTGTG 780 

TCGGTATTGA GGACCCTCAC GATTTGCTTG ATGATTTGGA GGCCTCTCTT GTGAACGCTG 840 

GCGCAATCCG ATCAGTCTCT ACCTCAGATT CATCCCGACC GCTCACTCCT CCTGCCTCTG 900 

ATTCTGCCTC GGACATTCAC TCCAACTGGG CCGTCGACCG AGCCAGACAG TTCGAGCGTG 960 

TTAGGCCTTC TAACTCGACA GCCGGCGTCG AAGGACAGCT TGCCGAACTC AATGTAGACG 1020 

ATGCAGCCAG ACTTGCGGGC GATGAGAGCC AAAAAGAAGA AATTCTTGTC AGTGCACCGG 1080 

45 GAAAGGTCAT TCTGTTCGGC GAACATGCTG TAGGCCATGG TGTTGTGAGT GAGAAATGAA 1140 

AGCTTTATGC TCTCATTGCA TCTTAACTTT TCCTCGCCTT TTTTGTTCTC TTCATCCCGT 1200 

CTTGATTGTA GGGATGCCCC CCTTTGCCCC TTTCCCCTTC TTGCATCTGT CTATATTTCC 1260 

TTATACATTT CGCTCTTAAG AGCGTCTAGT TGTACCTTAT AACAACCTTT GGTTTTAGCA 1320 

TCCTTTGATT ATTCATTTCT CTCATCCTTC GGTCAGAGGC TTTCGGCCAT CTTTACGTCT 1380 
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GATTAGATTG TAATAGCAAG AACTATCTTG CTAAGCCTTT TCTCTTCCTC TTCCTCCTAT 1440 

ATAAATCGAA TTCACTTTCG GACATGTTTA TTTTGGGGAA ATCATCAAGG GGTGGGGGGC 1500 

5 

CAATCCCGAC ACTAATTTTC TGCTCACGTC AAAACTCAGC GTTCAGAATC AGTCACTGAC 15 SO 

CCTGATACGT GTCTCTATGT GTGTGGGTGT ACGTGCGAAT TGTGACTCGA CGTTCTACGC 1620 

TTAAAAACAG ACCGGGATCG CTGCTTCCGT TGATCTTCGA TGCTACGCTC TTCTCTCACC 1680 

10 CACTGCTACG ACAACAACAT CATCGTCGTT ATCGTCTACA AACATTACCA TCTCCCTAAC 1740 

GGACCTGAAC TTTACGCAGT CTTGGCCTGT TGATTCTCTT CCTTGGTCAC TTGCGCCTGA 1800 

CTGGACTGAG GCGTCTATTC CAGAATCTCT CTGCCCGACA TTGCTCGCCG AAATCGAAAG I860 

y5 GATCGCTGGT CAAGGTGGAA ACGGAGGAGA AAGGGAGAAG GTGGCAACCA TGGCATTCTT 1920 

GTATTTGTTG GTGCTATTGA GCAAAGGGAA GCCAAGGTAG GTTTTTTCTG TCTCTTCTTT 1980 

TTGCCTATAA AGACTCTTAA CTGACGGAGA AAGTGTTGGG TTTCTTCCTT CGGGGGTTCA 2040 

ATCAATTAAA GTGAGCCGTT CGAGTTGACG GCTCGATCTG CGCTTCCGAT GGGAGCTGGT 2100 

CTGGGTTCAT CCGCCGCTCT ATCGACCTCT CTTGCCCTAG TCTTTCTTCT CCACTTTTCT 2160 

CACCTCAGTC CAACGACGAC TGGCAGAGAA TCAACAATCC CGACGGCCGA CACAGAAGTA 2220 

ATTGACAAAT GGGCGTTCTT AGCTGAAAAA GTCATCCATG GAAATCCGAG TGGGATTGAT 2280 

AACGCGGTCA GTACGAGAGG AGGCGCTGTT GCTTTCAAAA GAAAGATTGA GGGAAAACAG 2340 

GAAGGTGGAA TGGAAGCGAT CAAGAGGTAC GC AGACACGG TGCTTCATAT GCCATACTCC 2400 

AGTCTGATTG ACCCATGATG AACGTCTTTC TACATTTCGA ATATAGCTTC ACATCCATTC 2460 

30 GATTCCTCAT CACAGATTCT CGTATCGGAA GGGATACAAG ATCTCTCGTT GCAGGAGTGA 2520 

ATGCTCGACT GATTCAGGAG CCAGAGGTGA TCGTCCCTTT GTTGGAAGCG ATTCAGCAGA 2580 

TTGCCGATGA GGCTATTCGA TGCTTGAAAG ATTCAGAGAT GGAACGTGCT GTCATGATCG 2640 

35 ATCGACTTCA AGTTAGTTCT TGTTCCTTTC AAGACTCTTT GTGACATTGT GTCTTATCCA 2700 

TTTCATCTTC TTTTTTCTTC CTTCTTCTGC AGAACTTGGT CTCCGAGAAC CACGCACACC 2760 

TAGCAGCACT TGGCGTGTCC CACCCATCCC TCGAAGAGAT TATCCGGATC GGTGCTGATA 2820 

AGCCTTTCGA GCTTCGAACA AAGTTGACAG GCGCCGGTGG AGGTGGTTGC GCTGTAACCC 2880 

TGGTGCCCGA TGGTAAAGTC TCTCCTT TT C TCTTCCGTCC AAGCGACACA TCTGACCGAT 2940 

GCGCATCCTG TACTTTTGGT CAACCAGACT TCTCGACTGA AACCCTTCAA GCTCTTATGG 3000 

AGACGCTCGT TCAATCATCG TTCGCCCCTT ATATTGCCCG AGTGGGTGGT TCAGGCGTCG 3060 

GATTCCTTTC ATCAACTAAG GCCGATCCGG AAGATGGGGA GAACAGACTT AAAGATGGGC 3120 

TGGTGGGAAC GGAGATTGAT GAGCTAGACA GATGGGCTTT GAAAACGGGT CGTTGGTCTT 3180 

TTGCTTGAAC GAAAGATAGG AAACGGTGAT TAGGGTACAG ATCCTTTGCT GTCATTTTTA 3240 

50 CAAAACACTT TCTTATGTCT TCATGACTCA ACGTATGCCC TCATCTCTAT CCATAGACAG 3300 

CACGGTACCT CTCAGGTTTC AATACGTAAG CGTTCATCGA CAAAACATGC GGCACACGAA 3360 
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AACGAGTGGA TATAAGGGAG AAGAGAGATA TTAGAGCGAA AAAGAGAAGA GTGAGAGAGG 3420 

AAAAAAATAA CCGAGAACAA CTTATTCCGG TTTGTTAGAA TCGAAGATCG AGAAATATGA 3480 

AGTACATAGT ATAAAGTAAA GAAGAGAGGT TTACCTCAGA GGTGTGTACG AAGGTGAGGA 3540 

CAGGTAAGAG GAATAATTGA CTATCGAAAA AAGAGAACTC AACAGAAGCA CTGGGATAAA 3600 

GCCTAGAATG TAAGTCTCAT CGGTCCGCGA TGAAAGAGAA ATTGAAGGAA GAAAAAGCCC 3660 

CCAGTAAACA ATCCAACCAA CCTCTTGGAC GATTGCGAAA CACACACACG CACGCGGACA 3720 

TATTTCGTAC ACAAGGACGG GACATTCTTT TTTTATATCC GGGTGGGGAG AGAGAGGGTT 3780 

ATAGAGGATG AATAGCAAGG TTGATGTTTT GTAAAAGGTT GCAGAAAAAG GAAAGTGAGA 3840 

GTAGGAACAT GCATTAAAAA CCTGCCCAAA GCGATTTATA TCGTTCTTCT GTTTTCACTT 3900 

CTTTCCGGGC GCTTTCTTAG ACCGCGGTGG TGAAGGGTTA CTCCTGCCAA CTAGAAGAAG 3960 

CAACATGAGT CAAGGATTAG ATCATCACGT GTCTCATTTG ACGGGTTGAA AGATATATTT 4020 

AGATACTAAC TGCTTCCCAC GCCGACTGAA AAGATGAATT GAATCATGTC GAGTGGCAAC 4080 

GAACGAAAGA ACAAATAGTA AGAATGAATT ACTAGAAAAG ACAGAATGAC TAGAA 4135 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 2767 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 (iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 
35 (B) STRAIN: ATCC96594 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 401.. 451 

(ix) FEATURE: 
40 (A) NAME/KEY: intron 

(B) LOCATION: 452.. 633 

(ix) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION: 634.. 876 

45 (ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 877.. 1004 

(ix) FEATURE: 

(A) NAME/ KEY: exon 
50 (B) LOCATION: 1005.. 1916 

(ix) FEATURE: 
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<A) NAME/KEY: polyA^site 
(B) LOCATION: 2217 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GAATTCTTCC CGACTGGGCT GATCGACTTG ACTGGAAGAT CTAAGGCGGA GGGATGAAGG 60 

AAGTAATTGG AGGGAATGAG GAAAAAAAAA GGCGAGGGAA CGCGGTCTTC TTTCCTGGCA 120 

AGGCAATGTC GTGTATCTCT CTTGATTCTT TCGTTGTATC GACGGACCAC ACTCTTTTCG 180 

AATGAATATC ACTATCGCAT CCAATGATCG CTATACATGG CATTTACATA TGCCAGACAT 240 

CGCTGAGAAA GAGAGAACAT TCCTTTGGAA AAAGCCTACT GTGCCTGAAG TCAGGCTGAT 300 

GTTGATTAAA CGTCTTTCCC CATCCTAAGC AGACAAACAA CTTCTTTTCG TTCAACACAC 360 

15 CACCTCTCTC CGAAAAAGCT CTTCAATCCA GTCCATTAAG ATGGTTCATA TCGCTACTGC 420 

CTCGGCTCCC GTTAACATTG CGTGTATCAA GGTCCGTCTG CATTGTGAAT GCTGCTCGTT 480 

TGCCTTGTGT GCGTTTGGTG GATCTGAAAG AACCCTTGCT TGAACCATTC CATCTCTGCT 540 

20 CTTTTTCTTC CTGTCCTTTC CTTTTTCTCA CGACAAAAAA ACCACCTGGA CCCTTTGTGT 600 

TCCTTTCCAT TGGTGTTCAT ACACCTAACA CAGTACTGGG GTAAACGGGA TACCAAGTTG 660 

ATTCTCCCTA CAAACTCCTC CTTGTCTGTC ACTCTCGACC AGGATCACCT CCGATCGACG 720 

ACGTCTTCTG CTTGTGACGC CTCGTTCGAG AAGGATCGAC TTTGGCTTAA CGGGATCGAG 780 

GAGGAGGTCA AGGCTGGTGG TCGGTTGGAT GTCTGCATCA AGGAGATGAA GAAGCTTCGA 840 

GCGCAAGAGG AAGAGAAGGA TGCCGGTCTG GAGAAAGTGA GTTTTTCTCC TGTGTGCGTG 900 

TGTACTCTGT ATAGGTACCG TTGACAGGAC AGTCTTTCTG AAGAGTTTGG ATCTTACTCT 960 

TTTTTGGGGG GGTGGTGGTG TTTGAAATAA TGACCAAAAT AAAGCTCTCA TCTTTCAACG 1020 

TGCACCTTGC GTCTTACAAC AACTTCCCGA CTGCCGCTGG ACTTGCTTCC TCCGCTTCCG 1080 

GTCTAGCTGC GTTGGTCGCC TCGCTCGCCT CGCTCTACAA CCTCCCAACG AACGCATCCG 1140 

35 AACTCTCGCT CATCGCCCGA CAAGGTTCTG GTTCTGCCTG CCGATCGCTC TTCGGCGGGT 1200 

TCGTTGCTTG GGAACAGGGC AAGCTTTCCT CTGGAACCGA CTCGTTCGCT GTTCAGGTCG 1260 

AGCCCAGGGA ACACTGGCCC TCACTCCACG CGCTGATCTG TGTAGTTTCC GACGAGAAAA 1320 

40 AGACGACGGC CTCGACGGCA GGCATGCAAA CCACGGTGAA CACCTCGCCT TTGCTCCAAC 1380 

ACCGAATCGA ACACGTCGTT CCAGCCCGGA TGGAGGCCAT CACCCAGGCG ATCCGGGCCA 1440 

AGGATTTCGA CTCGTTCGCA AAGATCACCA TGAAGGACTC CAACCAGTTC CACGCCGTCT 1500 

GCCTCGATTC GGAACCCCCG ATCTTTTACT TGAACGATGT CTCCCGATCG ATCATCCATC 1560 

TCGTCACCGA GCTCAACAGA GTGTCCGTCC AGGCCGGCGG TCCCGTCCTT GCCGCCTACA 1620 

CGTTCGACGC CGGGCCGAAC GCGGTGATCT ACGCCGAGGA ATCGTCCATG CCGGAGATCA 1680 

TCAGGTTAAT CGAGCGGTAC TTCCCGTTGG GAACGGCTTT CGAGAACCCG TTCGGGGTTA 1740 

ACACCGAAGG CGGTGATGCC CTGAGGGAAG GCTTTAACCA GAACGTCGCC CCGGTGTTCA 1800 

GGAAGGGAAG CGTCGCCCGG TTGATTCACA CCCGGATCGG TGATGGACCC AGGACGTATG 1860 
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GCGAGGAGGA 


GAGCCTGATC 


GGCGAAGACG 


GTCTGCCAAA GGTCGTCAAG GCTTAGACTA 


1920 


TAGGTTGTTT 


CTTCTAAATT 


TGAGCCTTCC 


TCCCGCCTCC CTTCCACAAG CATAAAACAA 


1980 


AGGATAAACA 


AATGAATTAT 


CAAAATAACT 


ATAGGTTGTT TCTTCTAAAT TTGAGCCTTC 


2040 


CTCCCGCCTC 


CCTTCCACAA 


GCATAAAACA 


AAGGATAAAC AAATGAATTA TCAAAATAAA 


2100 


ATAAAAAGTC 


TGCCTTCTTT 


GTTTTGGAAT 


ACATCTTCTT TGGGACATGA CCCTTCTCCT 


2160 


TCTTTTCCGT 


ATACATCTTT 


TTGGGTATTT 


CATGGTGATC AAACAACATT GTGATCGAAA 


2220 


GCAGAGACGG 


CCATGGTGCT 


GGCTTTGAGC 


GTCTGGCGTT TTGTGTGTCC TGCACTTGAG 


2280 


CAACCCCAAG 


CTGACCGCTA 


GGAAAACTCA 


TTGATGTGAT TTATATCGTA CGATGAAAGA 


2340 


GAATAAAATG 


ATAGAAGAAC 


AAAGAAGAAC 


AAAGTAGAAG AACGTCTGAG AAGAAAGACA 


2400 


GGAAAATGAC 


ACGTACATAG 


TGTTCGATGA 


TGAATGATAT AATATTAAAT ATAAAATGAG 


2460 


GTAAACGTAT 


AGCATCACGG 


GATGAACGGA 


TGAACATGTA GTGGACAAGG TTGGGAAATA 


2520 


GGAATGTAGA 


ATCCAAGAAT 


CGTTGACTGA 


TGGACGGACG TATGTAAACA GGTACACCCC 


2580 


AAAGAAAAGA 


AAGAAAGAAA 


GAAAGAAAAC 


ACAAAGCCAA GGAAGTAAAG CAGATGGTCT 


2640 


TCTAAGAATA 


CGGCTTCAAA 


AAGACAGTGA 


ACACTCGTCG TCGAGGAATG ACAAGAAAAG 


2700 


TGAGAGACTA 


CGAAAGGAAG 


AAACCAAGAC 


GAAAAGAAGA ACGGAGATCG AACGGACAGA 


2760 


AATAAAG 








2767 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 4092 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 

35 

(iii) HYPOTHETICAL: NO 
Uv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaff ia rhodozyma 

(B) STRAIN: ATCC96594 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 852.-986 

45 (ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 987 ..1173 

(ix) FEATURE: 

(A) NAME/KEY: exon 
so (B) LOCATION: 1174.. 1317 

(iX) FEATURE: 

(A) NAME/KEY: intron 
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(B) LOCATION: 1318.. 1468 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1469.. 1549 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1550.. 1671 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1672.. 1794 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1795.. 1890 

(ix) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION: 1891.. 1979 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1980. .2092 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2093.. 2165 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2166. .2250 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2251.. 2391 

(ix) FEATURE: 

(A) NAME/KEY: intron 

<B) LOCATION: 2392.. 2488 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2489. .2652 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 2653.-2784 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 2785.. 2902 

(ix) FEATURE: 

(A) NAME/KEY: polyA_site 

(B) LOCATION: 3024 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CGCCCGGTAT CTTGCCACAG ATGCCGCCGG AGTGTCTGGC GGAGTGCTAG GAACAACGTC 
ATCTCCATCT GACGAGCAAG CGTACCACAA GCTAGCTCTT C6TCTGTCAG AAGGACATCC 
ACGCACCTTC CTGGCCTTCG GGGATGGCAC CTTCTCGTCG ACTTCCCATG GCCGTGCCCC 
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TGGCCTTGTG AAGATACTGT TTGCCAAGCT GAGCGCCTCC CCGCTGCTCC AGGTCCGCAA 240 

GGTCCGAGAG TATTGGACGT CGAAGATATG TTCAAAGTGT CAGGCGAGTT CTCGGGAGAA 300 

5 

AAAAAAAGCG TGGGCTCTGA AACAGTGTGG AAATGTCTAC AAAGTGAGCT GGATTTATTG 360 

TGTGTGTATG TGTGTGTGTG TGTATGTTCT GTGTTGGTTG CTCACTGTAC TCTATGCTCT 420 

CTCTTAGATT TGGGGAACAG TGCTGTGAAC GCGTCGCGAA ACATGCTGCA CCTAGCCCTT 480 

10 CACCAGAAGG AGAACCAGAG GGCGGGAATG CTGGTGTCTG ACGCTGCTAC TGCTGCTACG 540 

CTAGCCGCTG AGGCTGAGGC TGGCAGAAAC TAAATCCATG ACCCATCAGA TCTTGGTGAT 600 

TCGTGGTCTG AGGACACCCA AGTCCAAAAG GGCTATATAT CGACCATCAT CCGTTGCGGT 660 

75 CACTCAGTAG TAACTAAAGC TATACATAGG AATGTTCTGA ACTTGATAAC CCTAACACTA 720 

CGAAAATATC TCGGAAAATA GATTAATTTC CTTCTCATCT CAAACAAAAG ACACAACACC 780 

ATCAATCACG CTCCTTTCAC ACACTCTCCT TTTTGCTCTC TCGTTCGACA GAAAATAACA 840 

TCAATAGCCA AATGTCCACT ACGCCTGAAG AGAAGAAAGC AGCTCGAGCA AAGTTCGAGG 900 

CTGTCTTCCC GGTCATTGCC GATGAGATTC TCGATTATAT GAAGGGTGAA GGCATGCCTG 960 

CCGAGGCTTT GGAATGGATG AACAAGGTTC GTCAAGGGTT TCTTCTTTAT TCTTCTGGTC 1020 

TTTGTTTCGG TCGAACTGGC TTTCGAACTT GGCCTTGACC GGTTGGATCT CGGTTGTTGC 1080 

GCCAAAACGA TGTCGAAGCA AAACTTACTC TTACCTGTTC GGTTTCCTTC CTTCCGACCT 1140 

TCTCTCTACC CTTGCCTCCG ATCGGTCTTA TAGAACTTGT ACTACAACAC TCCCGGAGGA 1200 

AAACTCAACC GAGGACTTTC CGTGGTGGAT ACTTATATCC TTCTCTCGCC TTCTGGAAAA 1260 

GACATCTCGG AAGAAGAGTA CTTGAAGGCC GCTATCCTCG GTTGGTGTAT CGAGCTTGTA 1320 

CGCGTTTTCT TCATTCACCT TTCTTTCTCG TCTTCTACTC TCTTCTCTCG AACTATCTTC 1380 

CCTGCGTGTC ATCCTACACG AATCTTTATA CTTACATGTT GGAACATATG CCCTGTTCTT 1440 

35 AATTCACCTC TTTTGTCTCG GATGGTAGCT CCAAGCTTAC TTCTTGGTGG CTGATGATAT 1500 

GATGGACGCC TCAATCACCC GACGAGGCCA ACCCTGTTGG TACAAAGTTG TTAGTCCCTT 1560 

CTTCTCTTTC TGTCCTCTTT CTTCTGAGCT ATGCCAATTC TTGATTGAAA TCGGTGGTGC 1620 

40 CGTCCGGACT AATCCGTTTG TCGTTTTTAT CATATCTTCT TGCACAAACA GGAGGGAGTG 1680 

TCTAACATTG CCATCAACGA CGCGTTCATG CTCGAGGGAG CTATCTACTT TTTGCTCAAG 1740 

AAGCACTTCC GAAAGCAGAG CTACTATGTC GATCTGCTAG AGCTCTTCCA CGATGTTTGT 1800 

CTCTATTTCT TTTCTTCCTC CCCTCAATAA ACTGTATTTG TGACCATTCT GGATCCTTTC 1860 

CTGACGATGA ATCATTCTTC GGATGAGTAG GTTACTTTCC AAACCGAGTT GGGACAGCTC 1920 

ATCGATCTGT TGACCGCTCC TGAGGATCAC GTCGATCTCG ACAAGTTCTC CCTTAACAAG 1980 

TATGCCCGTC ATATATTCGT TTTGTTGCAT TCACGTCTGA TTGTCAGCTC CGATTATTGA 2040 

CTCTGATGGT GATGGTATTG ACCACATCAT GCGATGTTTG ACTTTCTCGT AGGCACCACC 2100 

TCATCGTTGT TTACAAGACC GCTTTCTATT CATTCTACCT TCCTGTCGCA CTCGCTATGC 2160 
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GAATGGTGGG TCTCTCTCTT CAACTGTTCT 
ATCCTTGGAA TTTTGAACTC TATGTCATAG 

5 

CTTGCGCTCT CGATCCTCAT CCCGATGGGT 
GACGCGTTCG CTCCTCCGGA GATCCTTGGA 
CGTTCCTTCC TTCTACGTTC TGTTTTCTAT 

10 CTGTTAAAAC GTATTGAAAC ATCAAAAGGA 

ACTCTCTCTC GCCTCGCCCG CTCAGCGAGA 
CTCGGAGGCA GAGGCCAGAG TCAAGGCTCT 

75 CAACGCTTAT GAGTATGTCA TCTTTTTTAA 

CCAAGAATTA TTTTGTGAAA GTTCTGGGAC 
CGCATATGTC TCCCGTTTGA ATAGGCAACA 
CAGTATTGAC GAAGAGAAGA GTGGACTCAA 

20 

GGTCTATAAG CGAAGCAAGT AATTCTCCTC 
GTGATAGGTA GGAAGAGAAG GGAGGGTCAT 
ATGATCAAAA AGGGATATCG GTCCTCTTCT 

25 

GCCGAACATG ACAAAAGTGG TTCATGAGAT 
GTACAATTCT CTCGCATCCT ATTAGGATCG 
CACCCCGTCA GATAACAAAC GAGAAGTCTC 

30 ATAAACTGAC GAGGATAACT TCCAATCCGA 

CCGCTCCGGT GCCTTCGAGT CCGATCAATG 
CTTGTTGAGG TGTATTTCTC GTCTGAGCAA 

35 ATATACCATC AACATCATCG TCATCACCAT 

GTTAATGGCA GGGCTTGGAC AACTTGAGGC 
CGACCCAGGG TGCACATCAC CAAGACACAT 
GAGGGAAGTA GTACGCTATC GAACGTCTTC 

40 

GCGATTCTTT TTGTTGAAAT AGAAAATTGA 
ACGGCTCTGT AGATTCATGC TCGAAAGAAA 
TCTGAATCTG TGGCCAACCA AAAAGTAGGC 

45 

AGTCTTTGAA CTGCTTGTGG ATGAGACAAG 
CATGGAGTAT CAAACACCTG AGAATAGGTC 
CATATGCGCG AAACGATCAG TACGACCGAC 

50 

GAACGAAAAG AGGACAAACC GCTCTGGATG 
CCCACCCTCA GG 



TCCTGATTTT CTTGACCATC TGTAACATAA 2220 

GTCGGCGTGA CAGATGAGGA GGCGTACAAG 2280 

GAATACTTTC AAGTTCAGGA TGATGTGCTC 2340 

AAGATCGGAA CCGACATCTT GGTGCGTTTT 2400 

CTTCTGACTC CCCGTCCATC ATTTATGCTT 2460 

CAACAAATGT TCATGGCCTA TCAACCTTGC 2520 

GATTCTCGAT ACTTCGTACG GTCAGAAGAA 2580 

GTACGCTGAG CTTGATATCC AGGGAAAGTT 2640 

ATTTTCTAAT TTTCTTTTCA TCTCTTGTTC 2700 

TGAACATGGT GCATCCCTTT GGGTTCACTC 2760 

GAGTTACGAG TCGCTGAACA AGTTGATTGA 2820 

GAAAGAAGTC TTCCACAGCT TCCTGGGTAA 2880 

TTTATATGCA AAGGGAAGAT TTTGGCGGGA 2940 

ATTCATTAGG CATTTCTCTT GCAGATATAG 3000 

TTGTTCCGAA TACATAATAA GTCATACGAA 3060 

CAAACTTTTT GCATGATCTT CTGCGATTTT 3120 

AACCAGGAGA AGATGAGAGA AGGAAACCCT 3180 

ATCACACACA CACACAGATG AAAGAGAAAA 3240 

TTTTTCCAGC CCACGAACCT TCCTTGGTCC 3300 

GGGCCCAAAC GCCTGAAGAT CCAAAGAACC 3360 

TCTTAGATCC TTCAATTTGC AGTCGCGCAT 3420 

CATTGTCGTC CACAACAGCA CCGCAACGCC 3480 

GGTTTCTAGC AGGTCGGACC GATTGGAGCT 3540 

TCTCCTTCAA ATGAGCGAAC AAGACATAAT 3600 

TCACATCCCG GGTTCTTGGC GTATCTTTTG 3660 

AGAGAAAAAA AGAGATCCAC ATGATGAAGA 3720 

GAAAGAAAGA AAAAGAGGGG AACGAACGGA 3780 

ACAAAGATGA CAACAGCGCC CTCTTCGACA 3840 

TCCCAGCAGA TCAACATTCC TGCTTTACCC 3900 

TTGCCCGGCT GTAGATAATC TCTGGACCGT 3960 

TCTACTCGAA GTCGTCAAGA GCACGGACGA 4020 

CCATAAATTT CTCTTCTCAT ACCTCTCCCA 4080 

4092 



36 



EP0 955 363 A2 



70 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1091 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Tyr Thr He Lys His Ser Asn Phe Leu Ser Gin Thr He Ser Thr 
15 10 15 

Gin Ser Thr Thr Ser Trp Val Val Asp Ala Phe Phe Ser Leu Gly Ser 
20 25 30 

Arg Tyr Leu Asp Leu Ala Lys Gin Ala Asp Ser Ala Asp He Phe Met 
35 40 45 

Val Leu Leu Gly Tyr Val Leu Met His Gly Thr Phe Val Arg Leu Phe 
50 55 60 

25 Leu Asn Phe Arg Arg Met Gly Ala Asn Phe Trp Leu Pro Gly Met Val 

65 70 75 80 

Leu Val Ser Ser Ser Phe Ala Phe Leu Thr Ala Leu Leu Ala Ala Ser 
85 90 95 

30 He Leu Asn Val Pro He Asp Pro He Cys Leu Ser Glu Ala Leu Pro 

100 105 110 

Phe Leu Val Leu Thr Val Gly Phe Asp Lys Asp Phe Thr Leu Ala Lys 
115 120 125 

35 Ser Val Phe Ser Ser Pro Glu He Ala Pro Val Met Leu Arg Arg Lys 

130 135 140 

Pro Val He Gin Pro Gly Asp Asp Asp Asp Leu Glu Gin Asp Glu His 
145 150 155 160 



15 



20 



40 



Ser Arg Val Ala Ala Asn Lys Val Asp He Gin Trp Ala Pro Pro Val 
165 170 175 

Ala Ala Ser Arg He Val He Gly Ser Val Glu Lys He Gly Ser Ser 
180 185 190 

He Val Arg Asp Phe Ala Leu Glu Val Ala Val Leu Leu Leu Gly Ala 
45 195 200 205 

Ala Ser Gly Leu Gly Gly Leu Lys Glu Phe Cys Lys Leu Ala Ala Leu 
210 215 220 

He Leu Val Ala Asp Cys Cys Phe Thr Phe Thr Phe Tyr Val Ala He 
so 225 230 235 240 

Leu Thr Val Met Val Glu Val His Arg He Lys He He Arg Gly Phe 
245 250 255 
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15 



20 



25 



30 



35 



40 



45 



50 



Arg Pro Ala His Asn Asn Arg Thr Pro Asn Thr Val Pro Ser Thr Pro 
260 265 270 

Thr lie Asp Gly Gin Ser Thr Asn Arg Ser Gly He Ser Ser Gly Pro 
275 280 285 

Pro Ala Arg Pro Thr Val Pro Val Trp Lys Lys Val Trp Arg Lys Leu 
290 295 300 

Met Gly Pro Glu He Asp Trp Ala Ser Glu Ala Glu Ala Arg Asn Pro 
305 310 315 320 

Val Pro Lys Leu Lys Leu Leu Leu He Leu Ala Phe Leu He Leu His 
325 330 335 

He Leu Asn Leu Cys Thr Pro Leu Thr Glu Thr Thr Ala He Lys Arg 
340 345 350 

Ser Ser Ser He His Gin Pro He Tyr Ala Asp Pro Ala His Pro He 
355 360 365 

Ala Gin Thr Asn Thr Thr Leu His Arg Ala His Ser Leu Val He Phe 
370 375 380 

Asp Gin Phe Leu Ser Asp Trp Thr Thr He Val Gly Asp Pro He Met 
385 390 395 400 

Ser Lys Trp He He lie Thr Leu Gly Val Ser lie Leu Leu Asn Gly 
405 410 415 

Phe Leu Leu Lys Gly He Ala Ser Gly Ser Ala Leu Gly Pro Gly Arg 
420 425 430 

Ala Gly Gly Gly Gly Ala Ala Ala Ala Ala Ala Val Leu Leu Gly Ala 
435 440 445 

Trp Glu lie Val Asp Trp Asn Asn Glu Thr Glu Thr Ser Thr Asn Thr 
450 455 460 

Pro Ala Gly Pro Pro Gly His Lys Asn Gin Asn Val Asn Leu Arg Leu 
465 470 475 480 

Ser Leu Glu Arg Asp Thr Gly Leu Leu Arg Tyr Gin Arg Glu Gin Ala 
485 490 495 

Tyr Gin Ala Gin Ser Gin He Leu Ala Pro He Ser Pro Val Ser Val 
500 505 510 

Ala Pro Val Val Ser Asn Gly Asn Gly Asn Ala Ser Lys Ser lie Glu 
515 520 525 

Lys Pro Met Pro Arg Leu Val Val Pro Asn Gly Pro Arg Ser Leu Pro 
530 535 540 

Glu Ser Pro Pro Ser Thr Thr Glu Ser Thr Pro Val Asn Lys Val He 
545 550 555 560 

lie Gly Gly Pro Ser Asp Arg Pro Ala Leu Asp Gly Leu Ala Asn Gly 
565 570 575 

Asn Gly Ala Val Pro Leu Asp Lys Gin Thr Val Leu Gly Met Arg Ser 
580 585 590 

He Glu Glu Cys Glu Glu lie Met Lys Ser Gly Leu Gly Pro Tyr Ser 
595 600 605 
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Leu Asn Asp Glu Glu Leu He Leu Leu Thr Gin Lys Gly Lys He Pro 
610 615 620 

Pro Tyr Ser Leu Glu Lys Ala Leu Gin Asn Cys Glu Arg Ala Val Lys 
5 625 630 635 640 

He Arg Arg Ala Val He Ser Arg Ala Ser Val Thr Lys Thr Leu Glu 
645 650 655 

Thr Ser Asp Leu Pro Met Lys Asp Tyr Asp Tyr Ser Lys Val Met Gly 
10 660 665 670 

Ala Cys Cys Glu Asn Val Val Gly Tyr Met Pro Leu Pro Val Gly He 
675 680 685 



75 



20 



Ala Gly Pro Leu Asn He Asp Gly Glu Val Val Pro He Pro Met Ala 
690 695 700 

Thr Thr Glu Gly Thr Leu Val Ala Ser Thr Ser Arg Gly Cys Lys Ala 
705 710 715 720 

Leu Asn Ala Gly Gly Gly Val Thr Thr Val He Thr Gin Asp Ala Met 
725 730 735 

Thr Arg Gly Pro Val Val Asp Phe Pro Ser Val Ser Gin Ala Ala Gin 
740 745 750 

Ala Lys Arg Trp Leu Asp Ser Val Glu Gly Met Glu Val Met Ala Ala 
755 760 765 

25 Ser Phe Asn Ser Thr Ser Arg Phe Ala Arg Leu Gin Ser He Lys Cys 

770 775 780 

Gly Met Ala Gly Arg Ser Leu Tyr He Arg Leu Ala Thr Ser Thr Gly 
785 790 795 800 

so Asp Ala Met Gly Met Asn Met Ala Gly Lys Gly Thr Glu Lys Ala Leu 

805 810 815 

Glu Thr Leu Ser Glu Tyr Phe Pro Ser Met Gin He Leu Ala Leu Ser 
820 825 830 

Gly Asn Tyr Cys He Asp Lys Lys Pro Ser Ala He Asn Trp He Glu 
35 835 840 845 

Gly Arg Gly Lys Ser Val Val Ala Glu Ser Val He Pro Gly Ala He 
850 855 860 



40 



45 



50 



55 



Val Lys Ser Val Leu Lys Thr Thr Val Ala Asp Leu Val Asn Leu Asn 
865 870 875 880 

He Lys Lys Asn Leu He Gly Ser Ala Met Ala Gly Ser He Gly Gly 
885 890 895 

Phe Asn Ala His Ala Ser Asp He Leu Thr Ser He Phe Leu Ala Thr 
900 905 910 

Gly Gin Asp Pro Ala Gin Asn Val Glu Ser Ser Met Cys Met Thr Leu 
915 920 925 

Met Glu Ala Val Asn Asp Gly Lys Asp Leu Leu He Thr Cys Ser Met 
930 935 940 

Pro Ala He Glu Cys Gly Thr Val Gly Gly Gly Thr Phe Leu Pro Pro 
945 950 955 960 
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Gin Asn Ala Cys Leu Gin Met Leu Gly Val Ala Gly Ala His Pro Asp 
965 970 975 

Ser Pro Gly His Asn Ala Arg Arg Leu Ala Arg lie lie Ala Ala Ser 
980 985 990 

Val Met Ala Gly Glu Leu Ser Leu Met Ser Ala Leu Ala Ala Gly His 
995 1000 1005 

Leu He Lys Ala His Met Lys His Asn Arg Ser Thr Pro Ser Thr Pro 
1010 1015 1020 

Leu Pro Val Ser Pro Leu Ala Thr Arg Pro Asn Thr Pro Ser His Arg 
1025 1030 1035 1040 

Ser lie Gly Leu Leu Thr Pro Met Thr Ser Ser Ala Ser Val Ala Ser 
1045 1050 1055 

Met Phe Ser Gly Phe Gly Ser Pro Ser Thr Ser Ser Leu Lys Thr Val 
1060 1065 1070 

Gly Ser Met Ala Cys Val Arg Glu Arg Gly Asp Glu Thr Ser Val Asn 
1075 1080 1085 

Val Asp Ala 
1090 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 467 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

Ciii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Tyr Thr Ser Thr Thr Glu Gin Arg Pro Lys Asp Val Gly He Leu 
15 10 15 

Gly Met Glu He Tyr Phe Pro Arg Arg Ala He Ala His Lys Asp Leu 
20 25 30 

Glu Ala Phe Asp Gly Val Pro Ser Gly Lys Tyr Thr He Gly Leu Gly 
35 40 45 

Asn Asn Phe Met Ala Phe Thr Asp Asp Thr Glu Asp He Asn Ser Phe 
50 55 60 

Ala Leu Asn Ala Val Ser Gly Leu Leu Ser Lys Tyr Asn Val Asp Pro 
65 70 75 80 

Lys Ser He Gly Arg He Asp Val Gly Thr Glu Ser He He Asp Lys 
85 90 95 

Ser Lys Ser Val Lys Thr Val Leu Met Asp Leu Phe Glu Ser His Gly 
100 105 HO 
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Asn Thr Asp He Glu Gly He Asp Ser Lys Asn Ala Cys Tyr Gly Ser 
115 120 125 

Thr Ala Ala Leu Phe Asn Ala Val Asn Trp He Glu Ser Ser Ser Trp 
130 135 140 

Asp Gly Arg Asn Ala He Val Phe Cys Gly Asp He Ala He Tyr Ala 
145 150 155 160 

Glu Gly Ala Ala Arg Pro Ala Gly Gly Ala Gly Ala Cys Ala He Leu 
165 170 175 

He Gly Pro Asp Ala Pro Val Val Phe Glu Pro Val His Gly Asn Phe 
180 185 190 

Met Thr Asn Ala Trp Asp Phe Tyr Lys Pro Asn Leu Ser Ser Glu Tyr 
195 200 205 

Pro He Val Asp Gly Pro Leu Ser Val Thr Ser Tyr Val Asn Ala He 
210 215 220 

Asp Lys Ala Tyr Glu Ala Tyr Arg Thr Lys Tyr Ala Lys Arg Phe Gly 
225 230 235 240 

Gly Pro Lys Thr Asn Gly Val Thr Asn Gly His Thr Glu val Ala Gly 
245 250 255 

Val Ser Ala Ala Ser Phe Asp Tyr Leu Leu Phe His Ser Pro Tyr Gly 
260 265 270 

Lys Gin Val Val Lys Gly His Gly Arg Leu Leu Tyr Asn Asp Phe Arg 
275 280 285 

Asn Asn Pro Asn Asp Pro Val Phe Ala Glu Val Pro Ala Glu Leu Ala 
290 295 300 

Thr Leu Asp Met Lys Lys Ser Leu Ser Asp Lys Asn Val Glu Lys Ser 
305 310 315 320 

Leu He Ala Ala Ser Lys Ser Ser Phe Asn Lys Gin val Glu Pro Gly 
325 330 335 

Met Thr Thr Val Arg Gin Leu Gly Asn Leu Tyr Thr Ala Ser Leu Phe 
340 345 350 

Gly Ala Leu Ala Ser Leu Phe Ser Asn Val Pro Gly Asp Glu Leu Val 
355 360 365 

Gly Lys Arg He Ala Leu Tyr Ala Tyr Gly Ser Gly Ala Ala Ala Ser 
370 375 380 

Phe Tyr Ala Leu Lys Val Lys Ser Ser Thr Ala Phe He Ser Glu Lys 
385 390 395 400 

Leu Asp Leu Asn Asn Arg Leu Ser Asn Met Lys He Val Pro Cys Asp 
405 410 415 

Asp Phe Val Lys Ala Leu Lys Val Arg Glu Glu Thr His Asn Ala Val 
420 425 430 

Ser Tyr Ser Pro He Gly Ser Leu Asp Asp Leu Trp Pro Gly Ser Tyr 
435 440 445 

Tyr Leu Gly Glu He Asp Ser Met Trp Arg Arg Gin Tyr Lys Gin Val 
450 455 460 
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Pro Ser Ala 
465 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 432 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A> ORGANISM: Phaffia rhodozyma 
(B) STRAIN: ATCC96594 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Lys Glu Glu He Leu Val Ser Ala Pro Gly Lys Val He Leu Phe Gly 
15 10 15 

Glu His Ala Val Gly His Gly Val Thr Gly He Ala Ala Ser Val Asp 
20 25 30 

Leu Arg Cys Tyr Ala Leu Leu Ser Pro Thr Ala Thr Thr Thr Thr Ser 
35 40 45 

Ser Ser Leu Ser Ser Thr Asn He Thr He Ser Leu Thr Asp Leu Asn 
50 55 60 

Phe Thr Gin Ser Trp Pro Val Asp Ser Leu Pro Trp Ser Leu Ala Pro 
65 70 75 80 

Asp Trp Thr Glu Ala Ser He Pro Glu Ser Leu Cys Pro Thr Leu Leu 
85 90 95 

Ala Glu He Glu Arg He Ala Gly Gin Gly Gly Asn Gly Gly Glu Arg 
100 105 HO 

Glu Lys Val Ala Thr Met Ala Phe Leu Tyr Leu Leu Val Leu Leu Ser 
115 120 125 

Lys Gly Lys Pro Ser Glu Pro Phe Glu Leu Thr Ala Arg Ser Ala Leu 
130 135 140 

Pro Met Gly Ala Gly Leu Gly Ser Ser Ala Ala Leu Ser Thr Ser Leu 
145 150 155 160 

Ala Leu Val Phe Leu Leu His Phe Ser His Leu Ser Pro Thr Thr Thr 
165 170 175 

Gly Arg Glu Ser Thr He Pro Thr Ala Asp Thr Glu Val He Asp Lys 
180 185 190 

Trp Ala Phe Leu Ala Glu Lys Val He His Gly Asn Pro Ser Gly He 
195 200 205 

Asp Asn Ala Val Ser Thr Arg Gly Gly Ala Val Ala Phe Lys Arg Lys 
210 215 220 

He Glu Gly Lys Gin Glu Gly Gly Met Glu Ala He Lys Ser Phe Thr 
225 " 230 235 240 



42 



EP0 955 363 A2 



Ser He Arg Phe Leu He Thr Asp Ser Arg He Gly Arg Asp Thr Arg 
245 250 255 

Ser Leu Val Ala Gly Val Asn Ala Arg Leu He Gin Glu Pro Glu Val 
260 265 270 

He Val Pro Leu Leu Glu Ala He Gin Gin He Ala Asp Glu Ala He 
275 280 285 

Arg Cys Leu Lys Asp Ser Glu Met Glu Arg Ala Val Met He Asp Arg 
290 295 300 

Leu Gin Asn Leu Val Ser Glu Asn His Ala His Leu Ala Ala Leu Gly 
305 310 315 320 

Val Ser His Pro Ser Leu Glu Glu He He Arg He Gly Ala Asp Lys 
325 330 335 

Pro Phe Glu Leu Arg Thr Lys Leu Thr Gly Ala Gly Gly Gly Gly Cys 
340 345 350 

Ala Val Thr Leu Val Pro Asp Asp Phe Ser Thr Glu Thr Leu Gin Ala 
355 360 365 

Leu Met Glu Thr Leu Val Gin Ser Ser Phe Ala pro Tyr He Ala Arg 
370 375 380 

Val Gly Gly Ser Gly Val Gly Phe Leu Ser Ser Thr Lys Ala Asp Pro 
385 390 395 400 

Glu Asp Gly Glu Asn Arg Leu Lys Asp Gly Leu Val Gly Thr Glu He 
405 410 415 

Asp Glu Leu Asp Arg Trp Ala Leu Lys Thr Gly Arg Trp Ser Phe Ala 
420 425 430 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 401 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 

Met Val His He Ala Thr Ala Ser Ala Pro Val Asn He Ala Cys He 
15 10 15 

Lys Tyr Trp Gly Lys Arg Asp Thr Lys Leu He Leu Pro Thr Asn Ser 
20 25 30 

Ser Leu Ser Val Thr Leu Asp Gin Asp His Leu Arg Ser Thr Thr Ser 
35 40 45 

Ser Ala Cys Asp Ala Ser Phe Glu Lys Asp Arg Leu Trp Leu Asn Gly 
50 55 60 
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He Glu Glu Glu Val Lys Ala Gly Gly Arg Leu Asp Val Cys He Lys 
65 70 75 80 

Glu Met Lys Lys Leu Arg Ala Gin Glu Glu Glu Lys Asp Ala Gly Leu 
85 90 95 

Glu Lys Leu Ser Ser Phe Asn Val His Leu Ala Ser Tyr Asn Asn Phe 
100 105 HO 

Pro Thr Ala Ala Gly Leu Ala Ser Ser Ala Ser Gly Leu Ala Ala Leu 
115 120 125 

Val Ala Ser Leu Ala Ser Leu Tyr Asn Leu Pro Thr Asn Ala Ser Glu 
130 135 140 

Leu Ser Leu He Ala Arg Gin Gly Ser Gly Ser Ala Cys Arg Ser Leu 
145 150 155 160 

Phe Gly Gly Phe Val Ala Trp Glu Gin Gly Lys Leu Ser Ser Gly Thr 
165 170 175 

Asp Ser Phe Ala Val Gin Val Glu Pro Arg Glu His Trp Pro Ser Leu 
180 185 190 

His Ala Leu He Cys Val Val Ser Asp Glu Lys Lys Thr Thr Ala Ser 
195 200 205 

Thr Ala Gly Met Gin Thr Thr Val Asn Thr Ser Pro Leu Leu Gin His 
210 215 220 

Arg He Glu His Val Val Pro Ala Arg Met Glu Ala He Thr Gin Ala 
225 230 235 240 

He Arg Ala Lys Asp Phe Asp Ser Phe Ala Lys He Thr Met Lys Asp 
245 250 255 

Ser Asn Gin Phe His Ala Val Cys Leu Asp Ser Glu Pro Pro He Phe 
260 265 270 

Tyr Leu Asn Asp Val Ser Arg Ser He He His Leu Val Thr Glu Leu 
275 280 285 

Asn Arg Val Ser Val Gin Ala Gly Gly Pro Val Leu Ala Ala Tyr Thr 
290 295 300 

Phe Asp Ala Gly Pro Asn Ala Val He Tyr Ala Glu Glu Ser Ser Met 
305 310 315 320 

Pro Glu He He Arg Leu He Glu Arg Tyr Phe Pro Leu Gly Thr Ala 
325 330 335 

Phe Glu Asn Pro Phe Gly Val Asn Thr Glu Gly Gly Asp Ala Leu Arg 
340 345 350 

Glu Gly Phe Asn Gin Asn Val Ala Pro Val Phe Arg Lys Gly Ser Val 
355 360 365 

Ala Arg Leu He His Thr Arg He Gly Asp Gly Pro Arg Thr Tyr Gly 
370 375 380 

Glu Glu Glu Ser Leu He Gly Glu Asp Gly Leu Pro Lys Val Val Lys 
385 390 395 400 

Ala 
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10 



15 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 355 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 
<B) STRAIN: ATCC96594 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Ser Thr Thr Pro Glu Glu Lys Lys Ala Ala Arg Ala Lys Phe Glu 
15 10 15 

Ala Val Phe Pro Val He Ala Asp Glu He Leu Asp Tyr Met Lys Gly 
20 25 30 

20 Glu Gly Met Pro Ala Glu Ala Leu Glu Trp Met Asn Lys Asn Leu Tyr 

35 40 45 

Tyr Asn Thr Pro Gly Gly Lys Leu Asn Arg Gly Leu Ser Val Val Asp 
50 ' 55 60 

25 Thr Tyr He Leu Leu Ser Pro Ser Gly Lys Asp He Ser Glu Glu Glu 

65 70 75 »0 

Tyr Leu Lys Ala Ala He Leu Gly Trp Cys He Glu Leu Leu Gin Ala 
85 90 95 

Tvr Phe Leu Val Ala Asp Asp Met Met Asp Ala Ser He Thr Arg Arg 
30 * 100 105 HO 

Gly Gin Pro Cys Trp Tyr Lys Val Glu Gly Val Ser Asn He Ala He 
115 120 125 

Asn Asn Ala Phe Met Leu Glu Gly Ala He Tyr Phe Leu Leu Lys Lys 
35 130 135 140 

His Phe Arg Lys Gin Ser Tyr Tyr Val Asp Leu Leu Glu Leu Phe His 
145 150 155 160 

Asp Val Thr Phe Gin Thr Glu Leu Gly Gin Leu He Asp Leu Leu Thr 
165 170 175 

Ala Pro Glu Asp His Val Asp Leu Asp Lys Phe Ser Leu Asn Lys His 
180 185 190 

His Leu He Val Val Tyr Lys Thr Ala Phe Tyr Ser Phe Tyr Leu Pro 
195 200 205 

Val Ala Leu Ala Met Arg Met Val Gly Val Thr Asp Glu Glu Ala Tyr 
210 215 220 

Lys Leu Ala Leu Ser He Leu He Pro Met Gly Glu Tyr Phe Gin Val 
225 230 235 240 

Gin Asp Asp Val Leu Asp Ala Phe Arg Pro Pro Glu He Leu Gly Lys 
245 250 255 
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He Gly Thr Asp He Leu Asp Asn 
260 

Ala Leu Ser Pro Ala Ser Pro Ala 
275 280 

Tyr Gly Gin Lys Asn Ser Glu Ala 
290 295 

Ala Glu Leu Asp He Gin Gly Lys 
305 310 

Tyr Glu Ser Leu Asn Lys Leu He 
325 

Gly Leu Lys Lys Glu Val Phe His 
340 

Arg Ser Lys 
355 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 
<C) STRAND EDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(iii) HYPOTHETICAL: NO 

(XV) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GGNAARTAYA CNATHGGNYT NGGNCA 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 26 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TANARNSWNS WNGTRTACAT RTTNCC 



(2) INFORMATION FOR SEQ ID N0:13: 



Lys Cys Ser Trp Pro He Asn Leu 
265 270 

Gin Arg Glu He Leu Asp Thr Ser 
285 

Glu Ala Arg Val Lys Ala Leu Tyr 
300 

Phe Asn Ala Tyr Glu Gin Gin Ser 
315 320 

Asp Ser He Asp Glu Glu Lys Ser 
330 335 

Ser Phe Leu Gly Lys Val Tyr Lys 
345 350 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GAAGAACCCC ATCAAAAGCC TCGA 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AAAAGCCTCG AGATCCTTGT GAGCG 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
AGAAGCCAGA AGAGAAAA 

(2) INFORMATION FOR SEQ ID NO:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 
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TCCTCGAGGA AAGTAGAT 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : Single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GGTACCATAT GTATCCTTCT ACTACCGAAC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GCATGCGGAT CCTCAAGCAG AAGGGACCTG 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL : NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
GCNTGYTGYG ARAAYGTNAT HGGNTAYATG CC 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATCCARTTDA TNGCNGCNGG YTTYTTRTCN GT 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GGCCATTCCA CACTTGATGC TCTGC 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
GGCCGATATC TTTATGGTCC T 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GGTACCGAAG AAATTATGAA GAGTGG 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CTGCAGTCAG GCATCCACGT TCACAC 



(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GCNCCNGGNA ARGTNATHYT NTTYGGNGA 



(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CCCCANGTNS WNACNGCRTT RTCNACNCC 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 17 base pairs 
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(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA < genomic) 
(iii) HYPOTHETICAL: NO ' 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ACATGCTGTA GTCCATG 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL : NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
ACTCGGATTC CATGGA 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

( C ) STRAND EDNES S : singl e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
TTGTTGTCGT AGCAGTGGGT GAGAG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



51 



EP0 955 363 A2 



<iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GGAAGAGGAA GAGAAAAG 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL : NO 

(iv) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TTGCCGAACT CAATGTAG 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
GGATCCATGA GAGCCCAAAA AGAAGA 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) STRAIN: ATCC96594 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
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GTCGACTCAA GCAAAAGACC AACGAC 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
HTNAARTAYT TGGGNAARMG NGA 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
GCRTTNGGNC CNGCRTCRAA NGTRTANGC 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
CCGAACTCTC GCTCATCGCC 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CAGATCAGC6 CGTGGAGTGA 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CARGCNTAYT TYYTNGTNGC NGAYGA 

(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CAYTTRTTRT CYTGDATRTC NGTNCCDATY TT 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
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ATCCTCATCC CGATGGGTGA ATACT 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
AGGAGCGGTC AACAGATCGA TGAGC 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GAATTCATAT GTCCACTACG CCTGA 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GTCGACGGTA CCTATCACTC CCGCC 
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Claims 

1. An isolated DNA sequence, which codes for an enzyme involved in the mevalonate pathway or the pathway from 
isopentenyl pyrophosphate to farnesyl pyrophosphate. 

2. An isolated DNA sequence according to claim 1 , wherein said enzyme has an activity selected from the group con- 
sisting of 3-hydroxy-3-methylglutaryl-CoA synthase activity. 3-hydroxy-3-methylglutaryyl-CoA reductase activity, 
mevalonate kinase activity, mevalonate pyrophosphate decarboxylase activity and farnesyl pyrophosphate syn- 
thase activity. 

3. An isolated DNA sequence according to claim 1 or 2, which is characterized in that 

(a) the said DNA sequence codes for the said enzyme having an amino acid sequence selected from the group 
consisting of those described in SEQ ID NOs: 6, 7. 8, 9 and 10, or 

(b) the said DNA sequence codes for a variant of the said enzyme selected from (0 an allelic variant, and (ii) 
an enzyme having one or more amino acid addition, insertion, deletion and/or substitution and having the 
stated enzyme activity. 

4. An isolated DNA sequence according to any one of claims 1-3, which can be derived from a gene of Phaffia 
rhodozyma and is selected from: 

(i) a DNA sequence represented in SEQ ID NOs: 1 , 2, 4 or 5; 

(ii) an isocoding or an allelic variant for the DNA sequence represented in SEQ ID NOs: 1 , 2, 4 or 5; and 

(iii) a derivative of a DNA sequence represented in SEQ ID NOs: 1, 2, 4 or 5 with addition, insertion, deletion 
and/or substitution of one or more nucleotide(s). and coding for a polypeptide having the said enzyme activity. 

5. An isolated DNA sequence, which is selected from: 

(i) a DNA sequence represented in SEQ ID NO: 3; 

(ii) an isocoding or an allelic variant for the DNA sequence represented in SEQ ID NO: 3; and 

(iii) a derivative of a DNA sequence represented in SEQ ID NO: 3 with addition, insertion, deletion and/or sub- 
stitution of one or more nucleotide(s), and coding for a polypeptide having the mevalonate kinase activity. 

6. An isolated DNA sequence as claimed in claim 1 or 2 and which is selected from: 

(i) a DNA sequence which hybridizes under standard conditions with a sequence as shown in SEQ ID Nos: 1 
- 10 or its complementary strand or fragments thereof; and 

(ii) a DNA sequence which do not hybridize as defined in (i) because of the degeneration of the genetic code 
but which codes for polypeptives having exactly the same amino acid sequence as shown in SEQ ID Nos: 1 - 
10 or those encoded by a DNA sequence as defined above under (i). 

7. A vector or plasmid comprising a DNA sequence as defined in any of claims 1 -6. 

8. A host cell which has been transformed or transfected by a DNA sequence as claimed in anyone of claims 1 to 6, 
or a vector or plasmid as claimed in claim 7. 

9. A process for producing an enzyme involed in the mevalonate pathway or the pathway from isopentenyl pyrophos- 
phate to farnesyl pyrophosphate, which comprises culturing a host cell as claimed in claim 8, under the conditions 
conductive to the production of said enzyme. 

10. A process for the production of isoprenoids or carotenoids, preferably astaxanthin, which comprises cultivating a 
host cell as claimed in claim 8 under suitable culture conditions. 
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Fig. 1 Biosynthetic pathway from acetyl-CoA 

to astaxanthin in P. rhodozyma 
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